Archive for the ‘RDF’ Category

Understanding the OpenCalais RDF Response

Saturday, September 26th, 2009

I’m using an XML version of an article published by Scoop in February 2000, Senior UN Officials Protest UN Sanctions On Iraq, to understand the OpenCalais RDF response as part of a larger project of linking extracted entities to existing Linked Data datasets.

OpenCalais uses natural language processing (NLP), machine learning and other methods to analyze content and return the entities it finds, such as the cities, countries and people with dereferenceable Linked Data style URIs. The entity types are defined in the OpenCalais RDF Schemas.

When I submit the content to the OpenCalais REST web service (using the default RDF response format) an RDF document is returned. Opened below with TopBraid Composer a portion of the input content and some of the entity types OpenCalais can detect is shown. The numbers in brackets indicate how many instances of an entity type have been detected, for example cle.Person(13) indicates that thirteen people have been detected.

The TopBraid Composer Instances tab contains the URIs of the people  detected. Opening the highlighted URI reveals that it is for a person named Saddam Hussein.

Entity Disambiguation

One of the challenges when analyzing content and extracting entities is entity disambiguation. Can the person named Saddam Hussein be uniquely identified. Usually the context is needed in order to disambiguate similar entities. As described in the OpenCalais FAQ if the “rdf:type rdf:resource” of a given entity contains /er/ the entity has been disambiguated by OpenCalais while if contains /em/ its not.

In the example above cle.Person is <http://s.opencalais.com/1/type/em/e/Person>. There is no obvious link to an “rdf:type rdf:resource” containing /er/. It looks like OpenCalais has been able to determined that the text “Saddam Hussein” equates to a Person, but has not been able to determine specifically who that person is.

In contrast Iraq ( one of three countries detected) is shown below with the Incoming Reference http://d.opencalais.com/er/geo/country/ralg-geo1/d3b1cee2-327c-fa35-7dab-f0289958c024.

Opening the URI http://d.opencalais.com/er/geo/country/ralg-geo1/d3b1cee2-327c-fa35-7dab-f0289958c024 with either an HTML browser as http://d.opencalais.com/er/geo/country/ralg-geo1/d3b1cee2-327c-fa35-7dab-f0289958c024.html or with an rdf browser as http://d.opencalais.com/er/geo/country/ralg-geo1/d3b1cee2-327c-fa35-7dab-f0289958c024.rdf ( in Tabulator below ) shows that the country has been disambiguated with <rdf:type rdf:resource=”http://s.opencalais.com/1/type/er/Geo/Country”/>.

Linked Data

In the RDF response returned by OpenCalais neither Iraq nor “Saddam Hussein” were linked to other Linked Data datasets. Some OpenCalais entities are. For example Moscow,Russia is linked via owl:sameAs to

Since I know that the context of the article is international news I can safely add some owl:sameAs links such as the following for Dbpedia links for “Saddam Hussein” (below) and Iraq.

Entity Relevance

For both detected entities “Saddam Hussein” and “Iraq” OpenCalais provides an entity relevance score (shown for each respectively in the screen shots below ) The relevance capability detects the importance of each unique entity and assigns a relevance score in the range 0-1 (1 being the most relevant and important). From the screen shots its clear that “Iraq” has been ranked more relevant.

Detection Information

The RDF Response includes the following properties relating to the subjects detection

  • c:docId: URI of the document this mention was detected in.
  • c:subject: URI of the unique entity.
  • c:detection: snippet of the input content where the metadata element was identified
  • c:prefix: snippet of the input content that precedes the current instance
  • c:exact: snippet of the input content in the matched portion of text
  • c:suffix: snippet of the input content that follows the current instance
  • c:offset: the character offset relative to the input content after it has been converted into XML
  • c:length: length of the instance.

The screen shot below for Saddam Hussein provides an example of how these properties work.

Conclusions

OpenCalais is a very impressive tool. It takes awhile though to fully understand the RDF response, especially in the areas of entity disambiguation and the linking of OpenCalais entities to other Linked Data datasets. Most likely there are some subtleties that I have missed or misunderstood so all clarifications welcome.

For entities extracted from international news sources and not linked to other Linked Data datasets it would be interesting to try some equivalence mining.

DBpedia Examples using Linked Data and Sparql

Monday, August 11th, 2008

Using Wikipedia, the largest online encyclopedia, users can browse and perform full-text searches, but programmatic access to the knowledge-base is limited.

The DBpedia project extracts structured information from Wikipedia opening it up to programmatic access using Semantic Web technologies such as Linked Data and SPARQL. This means that the linking and reasoning abilities of RDF and OWL can be utilized and queries for specific information can be made using SPARQL.

Simplistically the mapping from the Wikipedia HTML based web pages to the DBpedia RDF based resources can be thought of as replacing “http://en.wikipedia.org/wiki/” with “http://dbpedia.org/resource/” but in reality there are some additional subtleties which are described in the article From Wikipedia URI-s to DBpedia URI.

The Wikipedia entry for “Civil Engineering” (http://en.wikipedia.org/wiki/Civil_Engineering) is used as an example to show how specific data can be retrieved from its DBpedia equivalent (http://dbpedia.org/resource/Civil_engineering).

When both the Wikipedia entry (http://en.wikipedia.org/wiki/Civil_Engineering) and its DBpedia equivalent (http://dbpedia.org/resource/Civil_engineering) are opened in a standard web browser they display similar information, however the DBpedia equivalent has been redirected to http://dbpedia.org/page/Civil_engineering.

This redirect can be viewed in Firefox using the Tamper Data Firefox Extension as shown in the image below.

Loading the DBpedia Resource

The initial status of 303 is the HTTP response code “303 See Other“. The server replied with the HTTP response code 303 in order to direct the browser to URI http://dbpedia.org/page/Civil_engineering which is a HTML page the browser can display. The original URI http://dbpedia.org/resource/Civil_engineering is an RDF resource that would not display as well in the HTML browser.

DBpedia implements a HTTP mechanism called content negotiation in order to provide clients such as web browsers with the information they request in a form they can display. The tutorial How to publish Linked Data on the Web describe this and other Linked Data techniques that are used by applications such as DBpedia.

In order to access the RDF resource directly a web client needs to tell the server to send it RDF data. A client can do this by sending the HTTP Request Header Accept: application/rdf+xml as part of its initial request. (The HTML browser had sent an Accept: text/html HTTP header indicating that it was requesting an HTML page.)

The Firefox Addon RESTTest can be used to set Accept: application/rdf+xml in the HTTP Request Header and directly request http://dbpedia.org/resource/Civil_engineering as shown in the image below.

In this case the request to http://dbpedia.org/resource/Civil_engineering succeeded as shown by the “Response Status 200″ and a RDF document was received as shown in the “Response Text”.

In both the RDF fragment shown in the image above and in the HTML page http://dbpedia.org/page/Civil_engineering the multiple language support is visible. The SPARQL queries below show how to extract specific information for a particular language.

SPARQL

DBpedia provides a public SPARQL endpoint at http://dbpedia.org/sparql which enables users to query the RDF datasource with SPARQL queries such as the following.

SELECT ?abstract
WHERE {
{ <http://dbpedia.org/resource/Civil_engineering> <http://dbpedia.org/property/abstract> ?abstract }
}

The query returns all the abstracts for Civil Engineering, in each of the available languages.

The next query refines the abstracts returned to just the language specified, in this case ‘en’ (English).

SELECT ?abstract
WHERE {
{ <http://dbpedia.org/resource/Civil_engineering> <http://dbpedia.org/property/abstract> ?abstract .
FILTER langMatches( lang(?abstract), 'en') }
}

The SNORQL query explorer shown in the image below, provides a simpler interface to the DBpedia SPARQL endpoint. The image below shows both the query and the result returned.

Other SPARQL endpoints such as http://demo.openlinksw.com/sparql/ (shown below) can query DBpedia by specifying the FROM NAMED clause to describe the RDF dataset. E.g.

SELECT ?abstract
FROM NAMED <http://dbpedia.org>
WHERE {
{ <http://dbpedia.org/resource/Civil_engineering> <http://dbpedia.org/property/abstract> ?abstract.
FILTER langMatches( lang(?abstract), ‘en’) }
}

Other Related DBpedia Articles

RDF as self-describing Data uses DBpedia and its SPARQL support to show how RDF is essentially ’self-describing’ – there is no need to know about traditional metadata (schemas) before exploring a data set.

Linking to DBpedia with TopBraid outlines the benefit of DBpedia in terms of providing relatively stable URIs for all relevant real-world concepts, thus making it a natural place to connect specific domain models with each other using the OWL built in propery owl:sameAs ( This property indicates that two URI references actually refer to the same thing ). TopBraid Composer provides support to link domain models with DBpedia .

Querying DBpedia provides examples of using SPARQL to query DBpedia.

Adding Semantic Markup to Your Rails Application with DBpedia and ActiveRDF and
Get Semantic with DBPedia and ActiveRDF describe using ActiveRDF to integrate DBpedia resources into web based applications. ActiveRDF is a library for accessing RDF data from Ruby and Ruby On Rails programs and can perform SPARQL queries.

WWW2008 Linked Data Articles

Friday, April 25th, 2008

The WWW2008 Conference has published some great material, in particular the papers from the Linked Data on the Web (LDOW2008) Workshop.

The Workshop Introduction is an easy to read summary of the development of Linked Data and the Linking Open Data Project over the past year. It includes the Linking Open Data “cloud” diagram which shows the relationships between the main currently available datasets. A good way to get a feel for the amount and scope of available Linked Data is to open each of dataset in its own tab in Firefox and look across the spectrum of data presented.

The home page of the Linking Open Data Project also lists recent developments such as new datasets, tools, publications and conferences becoming available. Conferences in the near future include the Linked Data Planet Conference in New York in June, and the I-Semantics 2008 in Austria in September. I-Semantics 2008 includes the LOD Triplification Challenge for show casing applications which demonstrate the benefits of linked data to end users.

DBpedia Mobile: A Location-Enabled Linked Data Browser provides an overview of DBpedia Mobile, a location-centric DBpedia client application for mobile devices. “The DBpedia project extracts structured information from Wikipedia and publishes this information as Linked Data on the Web. The DBpedia datasets contain information for about 2.18 million things, including almost 300,000 geographic locations. DBpedia is interlinked with various other location-related datasets. Based on the current GPS position of a mobile device, DBpedia Mobile renders a map indicating nearby locations from the DBpedia dataset. Starting from this map, users can explore background information about locations and can navigate into interlinked datasets. DBpedia Mobile demonstrates that the DBpedia dataset can serve as a useful starting point to explore the Geospatial SemanticWeb using a mobile device.”

There are a couple of options for trying out DBpedia Mobile from your browser including Viewing based on IP Address.

For getting up to speed with RDF, OWL and SPARQL, the technologies that form the basis of Linked Data, a good tutorial just published is Understanding SPARQL.

A Semantic Web Architecture for a Rails Hosted Environment

Saturday, October 20th, 2007

Last week-end I installed ActiveRDF on my Mac OS X Powerbook, together with the Sparql, RDFLite and Redland adapters. Ideally I am working towards setting up an environment that allows me to build RESTful Semantic Web Applications that support reasoning over RDF data and implement a SPARQL query end point. Support for OpenID authentication, integrated with FOAF, is also at the top of the list.

On the Powerbook I could also install the ActiverRDF adapters for Sesame and Jena to give me the functionality that I am after but that only works in my development environment. Sesame and Jena are Java based. When it comes to deploying an application onto the web my options are currently more limited. 3kbo is deployed into hosted environment which supports PHP, Python, Ruby and Ruby On Rails and PERL, but no Java. (There is C/C++, limited to my local user account.)

Currently there are two PHP SPARQL implementations, ARC and RAP. RAP also provides a reasoning engine InfModel, with support for owl:sameAs and owl:inverseOf.

So at this stage the architecture that is emerging is an ActiveRDF RESTful Ruby On Rails application that uses RAP as the triple store, SPARQL query engine and reasoning engine. To integrate Rails with PHP I am planning to implement a RESTful PHP interface that acts as a facade to RAP.

Description of a Project

Wednesday, September 12th, 2007

In an earlier article Migrating an existing application to the iPhone and the Semantic Web I discussed some of the areas where Semantic Web concepts could be beneficially applied to the “Compliance Data Management Service” (CDMS) .To show the benefits of using RDF and OWL vocabularies I need to build up a number of practical examples.

In this article I present the first example, based on the Friend of a Friend (FOAF) and Description of a Project (DOAP) vocabularies.

There is a similarity between the concepts and descriptions used in the DOAP vocabulary, which describes open source software projects and the descriptions and concepts which relate to the building and construction projects the “Compliance Data Management Service” is used on. Both types of projects bring together people from different locations and organisations to work together. On both types of projects people may assume one or more roles as they work on different tasks. The DOAP vocabulary imports the Friend of a Friend (FOAF) vocabulary which is widely used on the Semantic Web to describe people and the people they know. It is used by the DOAP vocabulary and is the logical choice for describing the people working on CDMS projects.

Since CDMS itself is a software project (but not open source) the easiest example to create is a static DOAP (Description of a Project) file describing the CDMS software project, combined with a number of static FOAF files describing the various people working on it. The example follows the recipe for serving static RDF files outlined in the tutorial “How to Publish Linked Data on the Web“. It creates the CDMS DOAP file and related FOAF files, demonstrating basic linking between people and the project they work on.

The CDMS software project is being developed at ABE Services by four people, John Anderson, Mike Evans, Rob Beasley and myself. To represent this I created the following five static RDF files at www.abeservices.com.au.

Also Irene Bell-Hancock has created some icons and images for us and has been added to the CDMS project description as a documenter. Irene already has FOAF file at 3kbo so the CDMS DOAP file references Irene using the URI http://www.3kbo.com/people/irene.bell-hancock/foaf.rdf#me.

The basic structure of the CDMS DOAP file is outlined in the image below.

CDMS Developers

But a better way to understand the RDF files and how they link together is to use a good RDF browser such as one of the following:

Each of these browser have an input field which accepts a URI. Once the URI has been entered the RDF browser follows the RDF links and displays them as HTML. For example Disco “renders all information, that it can find on the Semantic Web about a specific resource, as an HTML page”. “While you move from resource to resource, the browser (Disco) dynamically retrieves information by dereferencing HTTP URIs and by following rdfs:seeAlso links.” The other RDF browser work in a similar way. Tabulator requires some configuration as described on the Tabulator home page.

Also available from the Tabulator home page is the Tabulator Firefox extension which makes browsing RDF data with Firefox extremely easy. Below is what is seen with Tabulator Firefox extension when the CDMS DOAP URI ( http://www.abeservices.com.au/projects/cdms/cdms-doap.rdf#CDMS )is first opened.

CDMS DOAP RDF File

Following the link to Irene displays her FOAF file (from 3kbo ) within the same html page that is displaying the CDMS DOAP file.

Irenes FOAF file within CDMS DOAP

On 3kbo there are two foaf.rdf files, http://www.3kbo.com/people/richard.hancock/foaf.rdf and http://www.3kbo.com/people/irene.bell-hancock/foaf.rdf. In both files the foaf:knows property is used to show that Richard knows Irene and Irene knows Richard. Using Tabulator it is easy to navigate from Irene’s foaf file to Richard’s.

Navigation via RDF data across web servers is illustrated by starting at the CDMS Description of a Project (DOAP) at www.abeservices.com.au and following the CDMS “Documenter” link to Irene then Irenes “Knows” link to Richard.

Richard’s 3kbo foaf file uses the built-in OWL property owl:sameAs to indicate that Richard at 3kbo is the same individual as Richard at abeservices. Setting owl:sameAs to the following <owl:sameAs rdf:resource=”http://www.abeservices.com.au/people/rhancock/foaf.rdf#rhancock“/> in the definition of Richard at http://www.3kbo.com/people/richard.hancock/foaf.rdf#i allows Tabulator to recognize the equivalence of the two definitions and merges the information from the two sources. This is shown in the image below.

Richards 3kbo FOAF Profile

A visual indication of the merging is that the two images reside on different servers, within different FOAF definitions of Richard, i.e. http://www.abeservices.com.au/people/rhancock/richard-hancock.jpg resides on www.abeservices.com.au and http://www.3kbo.com/people/richard.hancock/richard-hancock.jpg resides on www.3kbo.com.

Tabulator follows the principles of Web Architecture outlined in the tutorial How to Publish Linked Data on the Web. When it finds that an RDF data link leads to a standard html web document or image these are displayed within the page showing the RDF data. In addition to showing embedded images (like those shown above) Tabulator can also displays web sites embedded in the same page. A good example is Irene’s home page http://picasaweb.google.com/goannagraphics. In the image below the picasaweb slide show of the embedded home page has been activated and is fully functional.

Irene’s Homepage

Examples of FOAF properties which lead to web documents include foaf:homepage, foaf:weblog and foaf:workplaceHomepage. foaf:homepage and foaf:weblog are defined to be properties of OWL Type: InverseFunctionalProperty. As such they uniquely identify the person whose homepage or weblog it is and within Tabulator can lead to the merging of information in a way similar to that seen when the owl:sameAs property is applied.

In summary, the example above shows a number of the benefits of using RDF data and reusing RDF and OWL vocabularies. These include:

  1. Using standardized representations of people (FOAF) and (software) projects (DOAP) .
  2. Interlinking between sites using RDF data links allows data from different sources to be easily combined.
  3. Reasoning over data, e.g. the basic inferencing using owl:sameAs. Other examples include the foaf:homepage and foaf:weblog properties which are defined as owl:InverseFunctionalProperty. Taking selective advantage of the features of the Web Ontology Language (OWL) has the potential to reduce the amount of application specific code (e.g. java code) that needs to be written.

Future articles based on the examples created above and the existing CDMS application will include:

  1. Demonstrating the ability to query the constructed RDF data files using the SPARQL Query Lanaguage.
  2. Accessing existing data stored in a relational database as RDF using the D2R Server . The D2R Server enables RDF and HTML browsers to navigate the content of the database, and allows applications to query the database using the SPARQL Query Lanaguage.
  3. Develop a prototype semantic web application using an RDF Triple Store that supports the SPARQL Update specification.
  4. Create a SKOS glossary based on the blog entry Glossary of Common CDMS Term. The glossary would support the development of a building industry related ontologies.
  5. Define an ontology which provides a “Description of a Building Project” and link it to a suitable ontology which describes the tasks undertaken as part of a building project.