Archive for the ‘Semantic Web’ Category

Why Migrate to the Semantic Web?

Saturday, November 8th, 2008

Why Migrate to the Semantic Web? has just been published at Devx.com.

It pretty much summarizes my reasons migrating the CDMS application to the semantic web.

What it doesn’t describe in detail is that for building compliance at a specific locality it is the local legislation that takes precedence. This means that Linked Data from sources such as Dbpedia is great for describing concepts but at a local level you need to refer to Linked Data derived from local legislation to explicitly clarify the criteria that forms the basis of compliance.

Setting up MySql on Mac OSX for Jena SDB

Friday, October 17th, 2008

Awhile ago I installed MySQL 5.0.37 on my MacBook Pro using the default mysql settings.

Recently I installed Jena SDB 1.1, following the instructions on the wiki.

As part of the install I created a mysql database, specifying utf8, e.g.

mysql> create database sdb-index character set utf8 ;

and set up a store description (named sdb-index.ttl) based on the SDB example, changing it to use mysql and the “layout2/index” layout.

The create command worked fine
SDBROOT > bin/sdbconfig –sdb=sdb-index.ttl –create

but when I ran the testsuite
SDBROOT > bin/sdbtest –sdb=sdb-index.ttl testing/manifest-sdb.ttl

I got the following error in the Unicode-5 test.

Checking out the SDB notes for Mysql it seemed likely that the problem was related to the msyql default character set.

To see what was currently set I ran the “show variables” command below

mysql> show variables like ‘character%’;
+————————–+————————————————————+
| Variable_name | Value |
+————————–+————————————————————+
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | latin1 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/local/mysql-5.0.37-osx10.4-i686/share/mysql/charsets/ |
+————————–+————————————————————+

The SDB notes for Mysql recommends setting default-character-set=utf8. The Mysql documentation seemed to favour setting character-set-server=utf8 and collation-server=utf8_general_ci.

To make the changes I needed to create a config file with the changed settings that mysql reads on startup.

To do this I copied the example config file for small installations to /etc/my.cnf.

cp /usr/local/mysql-5.0.37-osx10.4-i686/support-files/my-small.cnf /etc/my.cnf

In the [mysqld] section of my.cnf I added the lines:
# utf8
init-connect=’SET NAMES utf8′
character-set-server=utf8
collation-server=utf8_general_ci

After restarting mysql the “show variables” command showed the following utf8 updates.
mysql> show variables like ‘character%’;
+————————–+————————————————————+
| Variable_name | Value |
+————————–+————————————————————+
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | latin1 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/local/mysql-5.0.37-osx10.4-i686/share/mysql/charsets/ |
+————————–+————————————————————+

When I tried it again the SDB testsuite ran without errors.

SDBROOT > bin/sdbtest –sdb=sdb-index.ttl testing/manifest-sdb.ttl

Common Ontologies for Semantic Web Domain Modeling

Saturday, October 11th, 2008

Below is a list of the ontologies I am currently using for semantic web domain modeling.

Core Ontologies

  • RDF Vocabulary The RDF Schema for the RDF vocabulary defined in the RDF namespace xmlns:rdf=”http://www.w3.org/1999/02/22-rdf-syntax-ns#”
  • RDFS The RDF Schema vocabulary xmlns:rdfs=”http://www.w3.org/2000/01/rdf-schema#”
  • XML Schema xmlns:xsd=”http://www.w3.org/2001/XMLSchema#”
  • OWL Web Ontology Language xmlns:owl=”http://www.w3.org/2002/07/owl#”
  • Dublin Core xmlns:dc=”http://purl.org/dc/elements/1.1/”
  • Dublin Core Metadata Terms xmlns:dct=”http://purl.org/dc/terms

Common Ontologies

Additional Ontologies

Requirements of a Semantic Web Framework

Friday, October 10th, 2008

The requirements of a good, generic Semantic Web framework include:

  • Support for the current SPARQL specification plus support for SPARQL Extensions such as count and insert, update, delete.
  • Provide inference capabilities for OWL ontologies.
  • Selectively apply role based security to published Linked Data, e.g. for project collaboration scenarios when sharing data externally with project partners.Support Named Entity recognition, i.e. easy look up and mapping of entities and concepts published as Linked Data.
  • Publish existing SQL databases, LDAP repositories and spreadsheets as RDF/OWL Linked Data.
  • Extract Semantic Metadata from unstructured sources such as text and HTML using natural language processing.

DBpedia Examples using Linked Data and Sparql

Monday, August 11th, 2008

Using Wikipedia, the largest online encyclopedia, users can browse and perform full-text searches, but programmatic access to the knowledge-base is limited.

The DBpedia project extracts structured information from Wikipedia opening it up to programmatic access using Semantic Web technologies such as Linked Data and SPARQL. This means that the linking and reasoning abilities of RDF and OWL can be utilized and queries for specific information can be made using SPARQL.

Simplistically the mapping from the Wikipedia HTML based web pages to the DBpedia RDF based resources can be thought of as replacing “http://en.wikipedia.org/wiki/” with “http://dbpedia.org/resource/” but in reality there are some additional subtleties which are described in the article From Wikipedia URI-s to DBpedia URI.

The Wikipedia entry for “Civil Engineering” (http://en.wikipedia.org/wiki/Civil_Engineering) is used as an example to show how specific data can be retrieved from its DBpedia equivalent (http://dbpedia.org/resource/Civil_engineering).

When both the Wikipedia entry (http://en.wikipedia.org/wiki/Civil_Engineering) and its DBpedia equivalent (http://dbpedia.org/resource/Civil_engineering) are opened in a standard web browser they display similar information, however the DBpedia equivalent has been redirected to http://dbpedia.org/page/Civil_engineering.

This redirect can be viewed in Firefox using the Tamper Data Firefox Extension as shown in the image below.

Loading the DBpedia Resource

The initial status of 303 is the HTTP response code “303 See Other“. The server replied with the HTTP response code 303 in order to direct the browser to URI http://dbpedia.org/page/Civil_engineering which is a HTML page the browser can display. The original URI http://dbpedia.org/resource/Civil_engineering is an RDF resource that would not display as well in the HTML browser.

DBpedia implements a HTTP mechanism called content negotiation in order to provide clients such as web browsers with the information they request in a form they can display. The tutorial How to publish Linked Data on the Web describe this and other Linked Data techniques that are used by applications such as DBpedia.

In order to access the RDF resource directly a web client needs to tell the server to send it RDF data. A client can do this by sending the HTTP Request Header Accept: application/rdf+xml as part of its initial request. (The HTML browser had sent an Accept: text/html HTTP header indicating that it was requesting an HTML page.)

The Firefox Addon RESTTest can be used to set Accept: application/rdf+xml in the HTTP Request Header and directly request http://dbpedia.org/resource/Civil_engineering as shown in the image below.

In this case the request to http://dbpedia.org/resource/Civil_engineering succeeded as shown by the “Response Status 200″ and a RDF document was received as shown in the “Response Text”.

In both the RDF fragment shown in the image above and in the HTML page http://dbpedia.org/page/Civil_engineering the multiple language support is visible. The SPARQL queries below show how to extract specific information for a particular language.

SPARQL

DBpedia provides a public SPARQL endpoint at http://dbpedia.org/sparql which enables users to query the RDF datasource with SPARQL queries such as the following.

SELECT ?abstract
WHERE {
{ <http://dbpedia.org/resource/Civil_engineering> <http://dbpedia.org/ontology/abstract> ?abstract }
}

The query returns all the abstracts for Civil Engineering, in each of the available languages.

The next query refines the abstracts returned to just the language specified, in this case ‘en’ (English).

SELECT ?abstract
WHERE {
{ <http://dbpedia.org/resource/Civil_engineering> <http://dbpedia.org/ontology/abstract> ?abstract .
FILTER langMatches( lang(?abstract), ‘en’) }
}

The SNORQL query explorer shown in the image below, provides a simpler interface to the DBpedia SPARQL endpoint. The image below shows both the query and the result returned.

Other SPARQL endpoints such as http://demo.openlinksw.com/sparql/ (shown below) can query DBpedia by specifying the FROM NAMED clause to describe the RDF dataset. E.g.

SELECT ?abstract
FROM NAMED <http://dbpedia.org>
WHERE {
{ <http://dbpedia.org/resource/Civil_engineering> <http://dbpedia.org/ontology/abstract> ?abstract.
FILTER langMatches( lang(?abstract), ‘en’) }
}

Other Related DBpedia Articles

RDF as self-describing Data uses DBpedia and its SPARQL support to show how RDF is essentially ’self-describing’ – there is no need to know about traditional metadata (schemas) before exploring a data set.

Linking to DBpedia with TopBraid outlines the benefit of DBpedia in terms of providing relatively stable URIs for all relevant real-world concepts, thus making it a natural place to connect specific domain models with each other using the OWL built in propery owl:sameAs ( This property indicates that two URI references actually refer to the same thing ). TopBraid Composer provides support to link domain models with DBpedia .

Querying DBpedia provides examples of using SPARQL to query DBpedia.

Adding Semantic Markup to Your Rails Application with DBpedia and ActiveRDF and
Get Semantic with DBPedia and ActiveRDF describe using ActiveRDF to integrate DBpedia resources into web based applications. ActiveRDF is a library for accessing RDF data from Ruby and Ruby On Rails programs and can perform SPARQL queries.