Archive for the ‘Semantic Web’ Category

Understanding the OpenCalais RDF Response

Saturday, September 26th, 2009

I’m using an XML version of an article published by Scoop in February 2000, Senior UN Officials Protest UN Sanctions On Iraq, to understand the OpenCalais RDF response as part of a larger project of linking extracted entities to existing Linked Data datasets.

OpenCalais uses natural language processing (NLP), machine learning and other methods to analyze content and return the entities it finds, such as the cities, countries and people with dereferenceable Linked Data style URIs. The entity types are defined in the OpenCalais RDF Schemas.

When I submit the content to the OpenCalais REST web service (using the default RDF response format) an RDF document is returned. Opened below with TopBraid Composer a portion of the input content and some of the entity types OpenCalais can detect is shown. The numbers in brackets indicate how many instances of an entity type have been detected, for example cle.Person(13) indicates that thirteen people have been detected.

The TopBraid Composer Instances tab contains the URIs of the people  detected. Opening the highlighted URI reveals that it is for a person named Saddam Hussein.

Entity Disambiguation

One of the challenges when analyzing content and extracting entities is entity disambiguation. Can the person named Saddam Hussein be uniquely identified. Usually the context is needed in order to disambiguate similar entities. As described in the OpenCalais FAQ if the “rdf:type rdf:resource” of a given entity contains /er/ the entity has been disambiguated by OpenCalais while if contains /em/ its not.

In the example above cle.Person is <http://s.opencalais.com/1/type/em/e/Person>. There is no obvious link to an “rdf:type rdf:resource” containing /er/. It looks like OpenCalais has been able to determined that the text “Saddam Hussein” equates to a Person, but has not been able to determine specifically who that person is.

In contrast Iraq ( one of three countries detected) is shown below with the Incoming Reference http://d.opencalais.com/er/geo/country/ralg-geo1/d3b1cee2-327c-fa35-7dab-f0289958c024.

Opening the URI http://d.opencalais.com/er/geo/country/ralg-geo1/d3b1cee2-327c-fa35-7dab-f0289958c024 with either an HTML browser as http://d.opencalais.com/er/geo/country/ralg-geo1/d3b1cee2-327c-fa35-7dab-f0289958c024.html or with an rdf browser as http://d.opencalais.com/er/geo/country/ralg-geo1/d3b1cee2-327c-fa35-7dab-f0289958c024.rdf ( in Tabulator below ) shows that the country has been disambiguated with <rdf:type rdf:resource=”http://s.opencalais.com/1/type/er/Geo/Country”/>.

Linked Data

In the RDF response returned by OpenCalais neither Iraq nor “Saddam Hussein” were linked to other Linked Data datasets. Some OpenCalais entities are. For example Moscow,Russia is linked via owl:sameAs to

Since I know that the context of the article is international news I can safely add some owl:sameAs links such as the following for Dbpedia links for “Saddam Hussein” (below) and Iraq.

Entity Relevance

For both detected entities “Saddam Hussein” and “Iraq” OpenCalais provides an entity relevance score (shown for each respectively in the screen shots below ) The relevance capability detects the importance of each unique entity and assigns a relevance score in the range 0-1 (1 being the most relevant and important). From the screen shots its clear that “Iraq” has been ranked more relevant.

Detection Information

The RDF Response includes the following properties relating to the subjects detection

  • c:docId: URI of the document this mention was detected in.
  • c:subject: URI of the unique entity.
  • c:detection: snippet of the input content where the metadata element was identified
  • c:prefix: snippet of the input content that precedes the current instance
  • c:exact: snippet of the input content in the matched portion of text
  • c:suffix: snippet of the input content that follows the current instance
  • c:offset: the character offset relative to the input content after it has been converted into XML
  • c:length: length of the instance.

The screen shot below for Saddam Hussein provides an example of how these properties work.

Conclusions

OpenCalais is a very impressive tool. It takes awhile though to fully understand the RDF response, especially in the areas of entity disambiguation and the linking of OpenCalais entities to other Linked Data datasets. Most likely there are some subtleties that I have missed or misunderstood so all clarifications welcome.

For entities extracted from international news sources and not linked to other Linked Data datasets it would be interesting to try some equivalence mining.

Australias Government 2.0 Taskforce commissions Semantic Web Project

Saturday, September 5th, 2009

The Australian Government initiated the Government 2.0 Taskforce in June 2009.

The launch video features Lindsay Tanner, Minister for Finance and Deregulation and chair Dr Nicholas Gruen in an enthusiastic presentation, outlining two key themes the government is keen for the taskforce to pursue.

These are:

  • Transparency and Openess. Using technology “to maximise the extent to which government information, data, and material can be put out into the public domain that we can be as accountable as possible, as transparent as possible and that this data is available for use in the general community.”
  • Community Engagement. Improving “the ways in which we engage with people in the wider community; in consultation, in discussion, in dialogue, about regulation, about government decisions, about policy generally.”

Examples of early government innovation include:

On 1 September 2009 the taskforce announced that it was Open for business commissioning six projects and inviting interested parties (individuals or companies) to submit quotes to be received by 9 September 2009.

Early leadership in Semantic Web

Of particular interest is the Early leadership in Semantic Web project. The project deliverable is to be a report which includes:

  • a guide for use by Australian Government agencies that will assist them with proper semantic tagging of datasets;
  • identified Australian Government datasets that could benefit from proper semantic tagging;
  • and a case study on the process and any issues from of applying proper semantic tagging to an indentified agency dataset.

Both this and the fact that government departments such as the Australian Bureau of Statistics are moving to release data under a creative commons license is another encouraging sign that an open web of linked data is in the process of evolving.

PricewaterhouseCoopers forecast the Semantic Web

Sunday, June 7th, 2009

The freely available PricewaterhouseCoopers Spring 2009 Technology Forecast explains the value of the Semantic Web and Linked Data in the context of Enterprise applications, presenting interviews with leaders in the field and outlining how CIOs and individual departments can introduce Semantic Web technologies into their organizations.

Forecasts include:

  • “During the next three to five years, we forecast a transformation of the enterprise data management function driven by explicit engagement with data semantics” and
  • “PricewaterhouseCoopers believes a Web of data will develop that fully augments the document Web of today”.

W3C standards providing the foundation for this Web of data include URIs, RDF, RDF Schema (RDFS), the Web Ontology Language (OWL) and the Semantic Protocol and RDF Query Language (SPARQL).

URIs are more specific in a Semantic Web context than URLs, often including a hash that points to a thing such as an individual musician, a song of hers, or the label she records for within a page, rather than just the page itself.”

RDF takes the data elements identified by URIs and makes statements about the relationship of one element to another.”

Each statement is a triple, a subject-predicate-object combination.

Ontologies (based on RDFS and OWL) describe the characteristics of these RDF data elements and their relationships within specific domains, facilitating machine interpretability of the data content.

“In this universe of nouns and verbs, the verbs articulate the connections, or relationships, between nouns. Each noun then connects as a node in a networked structure, one that scales easily because of the simplicity and uniformity of its Web-like connections.”

The Web of data approach clearly benefits a company such as the British Broadcasting Corporation (BBC) which “links to URIs at DBpedia.org, a version of the structured information on Wikipedia, to enrich sites such as its music site (http://www.bbc.co.uk/music/)”. It also links MusicBrainz for information about artists and recording.

As described by Tom Scott of BBC Earth:

“The relationship between the BBC content, the DBpedia content, and MusicBrainz is no more than URIs. We just have links between these things, and we have an ontology that describes how this stuff maps together.”

Other reviews of the PricewaterhouseCoopers Spring 2009 Technology Forecast include:

Tom Scott has a presentation on Linking bbc.co.uk to the Linked Data cloud and the article  DBpedia Examples using Linked Data and Sparql provides a simple example of using SPARQL to query Dbpedia.

Logging in with FOAF+SSL

Friday, April 17th, 2009

“FOAF+SSL is an authentication and authorization protocol that links a Web ID to a public key, thereby enabling a global, decentralized/distributed, and open yet secure social network.”

In my case my FOAF file http://www.3kbo.com/people/richard.hancock/foaf.rdf#i is my Web ID.

A site using FOAF+SSL is Shout Box. Once a user has logged in to Shout Box and left a comment Shout Box displays the users Web ID along side their comment.

foaf-me-shout-box

A user logging in to Shout Box identifies themselves with a certificate stored in their browser. If a user has more than one certificate installed they can choose from the list of certificates presented by the browser certifcate manager (shown below for Firefox).

Selecting a certificate for a FOAF+SSL login is simpler and quicker than typing a user name and password.

The two obvious things are user needs in order to login with FOAF+SSL are:

FOAF+SSL also requires:

  • A reference to the Web ID from the certificate. This is provided by setting the Web ID as the value for  “X509v3 Subject Alternative Name”.
  • The public key of the certificate published in the Web ID (FOAF file).

If the key published in the Web ID matches that contained in the certificate then the server can conclude that the person logging in is the owner of the Web ID (FOAF file).

I can check the details of the certificate I have been using and see the reference to my Web ID by first opening the Firefox Certificate Manager (by pasting chrome://pippki/content/certManager.xul into the brower location bar). The Certificate Manager lists all the installed certificates.

To see more information about this certificate I select it then click “View …” to get a dialog box with two tabs “General” and “Details”.  Selecting the “Details” tab and “Certificate Subject Alt Name” shows that my Web ID, http://www.3kbo.com/people/richard.hancock/foaf.rdf#i is the value set for the “X509v3 Subject Alternative Name.

An easy way to create an X509 certificate with a reference to a Web ID is to follow the steps outlined in Henry Storys article creating a foaf+ssl cert in a few clicks. I used this process to create the other two certificates shown above.

I created my main X509 certificate by following the steps outlined by Henry in his earlier article foaf+ssl: a first implementation. This gives a good programmatic understanding of what’s happening.

( The code is under activate development so if you try it and have problems then check out revision 468 to get the code that matches the article i.e. svn checkout https://sommer.dev.java.net/svn/sommer/trunk sommer-r468 –username guest -r 468 )

Using this approach the main tasks for setting up a user with FOAF+SSL are:

  • Running GenerateKey to create an X509 certificate , setting an existing FOAF file as the Web ID.
  • Adding the RDF statements defining the public key to the FOAF file.
  • Adding the X509 certificate to the users browser.

GenerateKey generates the RDF statements defining the public key in N3 format. If your FOAF file is in RDF/XML format like mine then you need to convert from N3 to the RDF/XML.

Adding the following worked for me:

<rsa:RSAPublicKey>
<cert:identity rdf:resource="#i"/>
<rsa:public_exponent cert:decimal="65537"/>
<rsa:modulus cert:hex="d258d85da71a4f1199cae5e8e18a5ffa9127d9796526299b746de9fdcbc1364e074dc143d0ebbd3d3890d7e95b8b4931e3798a7a8f8dbd3441927b6601fb504ca2a919a803e31a6112fea227102dc1424946fb92f8f651f3da855ec43e496f8e0098b596f33af80e7b86d831d46948e040a656f3f00a67b724ccfb55fa4660d3" />
</rsa:RSAPublicKey>

A GoodRelations Semantic Web Description of a Business

Saturday, April 11th, 2009

Tried out the newly released GoodRelations Annotator to create a Semantic Web description of a business.

The GoodRelations Annotator is an online form-based tool that creates an RDF/XML file “semanticweb.rdf” containing a description of the key aspects of the business. The description is based on concepts defined in the GoodRelations OWL ontology. In particular the description contains a BusinessEntity representing the business and one or more Offerings. Each Offering describes the intent to provide a Business Function for a certain Product or Service to a specified target audience.

The generated RDF/XML file can be either be published directly on the company’s Web site or used as a skeleton for developing a more fine-grained description.

The link Publishing GoodRelations Data on the Web provides guidelines on publishing to the web.

In my case I created a description for my embryonic business 3kbo.

I’m interested in linking the generated semanticweb.rdf to other things, in particular linking the BusinessEntity with people and with other BusinessEntitys.

Initially I added the URI of my foaf file to the BusinessEntity instance using rdfs:seeAlso, but after reading the definition of BusinessEntity i.e. that it represents the legal agent making a particular offering and
can be a legal body or a person, I changed it to owl:sameAs.

E.g.

<gr:BusinessEntity rdf:ID=”BusinessEntity”>

<owl:sameAs
rdf:resource=”http://www.3kbo.com/people/richard.hancock/foaf.rdf#i“/>

</gr:BusinessEntity>

This makes sense for my simple case, since as a sole trader I am the BusinessEntity. When viewed in Firefox using the Tabulator Extension owl:sameAs also provides an inferred link from my foaf file to my semanticweb.rdf as shown below.

foaf-infers-goodrelations

A part of the business description I don’t understand yet is how best to use the eClassOWL ontology to describe the Product or Service.

For example using the GoodRelations Annotator I selected “19 information, communication and media technology” as the Category and “1904 Software” as the Group.

eClassProductCategory

This leads to http://www.ebusiness-unibw.org/ontologies/eclass/5.1.4/#C_AKJ317003-tax being used in the definition of the product or service, i.e.

<gr:typeOfGood>
<gr:ProductOrServicesSomeInstancesPlaceholder rdf:ID=“ProductOrServicesSomeInstancesPlaceholder_1″>
<rdf:type rdf:resource=”"&eco;#C_AKJ317003-tax”>

<gr:ProductOrServicesSomeInstancesPlaceholder>
<gr:typeOfGood>

Because of the size of the eClassOWL ontology it takes awhile to dereference this link. It would be good to be able to provide a  more user friendly reference at this point that provided a description of the product or service.

Beyond this simple example I am interested in semantic web descriptions of other more complex relationships between a BusinessEntity (when not a person) and the people involved with the business (e.g. directors, CEO etc …) and between other BusinessEntitys.

Potentially GoodRelations and eClassOWL could be used as part of an Enterprise Architecture describing the who, what, how, when, where and why of a business.