Posts Tagged ‘Semantic Web’

Drupal7 RDFa XMLLiteral content processing

Saturday, March 12th, 2011

Drupal 7 supports RDFa 1.0 as part of the core product. RDFa 1.0 is the current specification but RDFa 1.1 is to be released shortly.

RDFa 1.0 metadata can be parsed using the RDFa Distiller and Parser while the RDFa Distiller and Parser (Test Version for RDFa 1.1) can be used to extract RDFa 1.1.

Creating a simple Drupal 7 test blog and parsing out the RDFa 1.0 metadata with the RDFa Distiller and Parser shows that Drupal 7 is using the SIOC (Semantically-Interlinked Online Communities) ontology to describe blog posts and identifies the Drupal user as the creator of the post using the sioc:has_creator property.

<sioc:Post rdf:about="http://137breakerbay.3kbo.com/test">
  <rdf:type rdf:resource="http://rdfs.org/sioc/types#BlogPost"/>
  ...
  <sioc:has_creator>
    <sioc:UserAccount rdf:about="http://137breakerbay.3kbo.com/user/2">
      <foaf:name>Richard</foaf:name>
    </sioc:UserAccount>
  </sioc:has_creator>
  ...
  <content:encoded rdf:parseType="Literal"><p xml:lang="en" xmlns="http://www.w3.org/1999/xhtml">Test Blog</p>
  </content:encoded>
</sioc:Post>

The sioc:UserAccount is a sub class of foaf:OnlineAccount.

What I would like to do is add additional RDFa metadata within the content of the blog to associate me, the Drupal 7 user with a sioc:UserAccount, to me the foaf:Person identified by my FOAF file.

Drupal 7 content is wrapped by XHTML elements containing the property=”content:encoded” (shown below) and an RDFa parser treats this content as an XMLLiteral.

<div property="content:encoded">
...
</div>

The problem is that RDFa 1.0 parsers don’t extract metadata contained within the XMLLiteral.

This was raised in the issue “XMLLiteral content isn’t processed for RDFa attributes in RDFa 1.0 – should this change in RDFa 1.1? a while back with the result that in RDFa 1.1 parsers should now also process the XMLLiteral content.

To make sure that the RDFa parsers know that I want to use RDFa 1.1 processing I need to update Drupal 7 to use the  XHTML+RDFa Driver Module defined in the XHTML+RDFa 1.1 spec.

This turns out to be a simple update of one Drupal 7 file, site/modules/system/html.tpl.php.

Near the top of the file the version is changed to 1.1 (in two places) and the dtd changed to  “http://www.w3.org/MarkUp/DTD/xhtml-rdfa-2.dtd”.

?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.1//EN"
  "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-2.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="<?php print $language->language; ?>" version="XHTML+RDFa 1.1" dir="<?php print $language->dir; ?>"<?php print $rdf_namespaces; ?>>

With these changes made I can create another blog containing the following RDFa metadata

<div about="http://www.3kbo.com/people/richard.hancock/foaf.rdf#i" typeof="foaf:Person">
<div rel="foaf:account" resource="http://137breakerbay.3kbo.com/user/2">
...
<div>
</div>

knowing that that an RDFa 1.1 parser will create the RDF triples below which link the Drupal 7 user to me the person identified in my FOAF file.

  <foaf:Person rdf:about="http://www.3kbo.com/people/richard.hancock/foaf.rdf#i">
    <foaf:account>
      <sioc:UserAccount rdf:about="http://137breakerbay.3kbo.com/user/2">
        <foaf:name>Richard</foaf:name>
      </sioc:UserAccount>
    </foaf:account>
  </foaf:Person>

The differences between the RDF extracted with an RDFa 1.0 parser and an RDFa 1.1 parser can be seen using the two links below.

Now that I know that the RDFa 1.1 metadata embedded in the content will be processed accordingly I can move on to the task of building 137 Breaker Bay, a simple accommodation site where the plan is to use RDFa and ontologies such as GoodRelations to describe the both the accommodation services available and the attractions and services of the surrounding area.

Understanding the OpenCalais RDF Response

Saturday, September 26th, 2009

I’m using an XML version of an article published by Scoop in February 2000, Senior UN Officials Protest UN Sanctions On Iraq, to understand the OpenCalais RDF response as part of a larger project of linking extracted entities to existing Linked Data datasets.

OpenCalais uses natural language processing (NLP), machine learning and other methods to analyze content and return the entities it finds, such as the cities, countries and people with dereferenceable Linked Data style URIs. The entity types are defined in the OpenCalais RDF Schemas.

When I submit the content to the OpenCalais REST web service (using the default RDF response format) an RDF document is returned. Opened below with TopBraid Composer a portion of the input content and some of the entity types OpenCalais can detect is shown. The numbers in brackets indicate how many instances of an entity type have been detected, for example cle.Person(13) indicates that thirteen people have been detected.

The TopBraid Composer Instances tab contains the URIs of the people  detected. Opening the highlighted URI reveals that it is for a person named Saddam Hussein.

Entity Disambiguation

One of the challenges when analyzing content and extracting entities is entity disambiguation. Can the person named Saddam Hussein be uniquely identified. Usually the context is needed in order to disambiguate similar entities. As described in the OpenCalais FAQ if the “rdf:type rdf:resource” of a given entity contains /er/ the entity has been disambiguated by OpenCalais while if contains /em/ its not.

In the example above cle.Person is <http://s.opencalais.com/1/type/em/e/Person>. There is no obvious link to an “rdf:type rdf:resource” containing /er/. It looks like OpenCalais has been able to determined that the text “Saddam Hussein” equates to a Person, but has not been able to determine specifically who that person is.

In contrast Iraq ( one of three countries detected) is shown below with the Incoming Reference http://d.opencalais.com/er/geo/country/ralg-geo1/d3b1cee2-327c-fa35-7dab-f0289958c024.

Opening the URI http://d.opencalais.com/er/geo/country/ralg-geo1/d3b1cee2-327c-fa35-7dab-f0289958c024 with either an HTML browser as http://d.opencalais.com/er/geo/country/ralg-geo1/d3b1cee2-327c-fa35-7dab-f0289958c024.html or with an rdf browser as http://d.opencalais.com/er/geo/country/ralg-geo1/d3b1cee2-327c-fa35-7dab-f0289958c024.rdf ( in Tabulator below ) shows that the country has been disambiguated with <rdf:type rdf:resource=”http://s.opencalais.com/1/type/er/Geo/Country”/>.

Linked Data

In the RDF response returned by OpenCalais neither Iraq nor “Saddam Hussein” were linked to other Linked Data datasets. Some OpenCalais entities are. For example Moscow,Russia is linked via owl:sameAs to

Since I know that the context of the article is international news I can safely add some owl:sameAs links such as the following for Dbpedia links for “Saddam Hussein” (below) and Iraq.

Entity Relevance

For both detected entities “Saddam Hussein” and “Iraq” OpenCalais provides an entity relevance score (shown for each respectively in the screen shots below ) The relevance capability detects the importance of each unique entity and assigns a relevance score in the range 0-1 (1 being the most relevant and important). From the screen shots its clear that “Iraq” has been ranked more relevant.

Detection Information

The RDF Response includes the following properties relating to the subjects detection

  • c:docId: URI of the document this mention was detected in.
  • c:subject: URI of the unique entity.
  • c:detection: snippet of the input content where the metadata element was identified
  • c:prefix: snippet of the input content that precedes the current instance
  • c:exact: snippet of the input content in the matched portion of text
  • c:suffix: snippet of the input content that follows the current instance
  • c:offset: the character offset relative to the input content after it has been converted into XML
  • c:length: length of the instance.

The screen shot below for Saddam Hussein provides an example of how these properties work.

Conclusions

OpenCalais is a very impressive tool. It takes awhile though to fully understand the RDF response, especially in the areas of entity disambiguation and the linking of OpenCalais entities to other Linked Data datasets. Most likely there are some subtleties that I have missed or misunderstood so all clarifications welcome.

For entities extracted from international news sources and not linked to other Linked Data datasets it would be interesting to try some equivalence mining.

Australias Government 2.0 Taskforce commissions Semantic Web Project

Saturday, September 5th, 2009

The Australian Government initiated the Government 2.0 Taskforce in June 2009.

The launch video features Lindsay Tanner, Minister for Finance and Deregulation and chair Dr Nicholas Gruen in an enthusiastic presentation, outlining two key themes the government is keen for the taskforce to pursue.

These are:

  • Transparency and Openess. Using technology “to maximise the extent to which government information, data, and material can be put out into the public domain that we can be as accountable as possible, as transparent as possible and that this data is available for use in the general community.”
  • Community Engagement. Improving “the ways in which we engage with people in the wider community; in consultation, in discussion, in dialogue, about regulation, about government decisions, about policy generally.”

Examples of early government innovation include:

On 1 September 2009 the taskforce announced that it was Open for business commissioning six projects and inviting interested parties (individuals or companies) to submit quotes to be received by 9 September 2009.

Early leadership in Semantic Web

Of particular interest is the Early leadership in Semantic Web project. The project deliverable is to be a report which includes:

  • a guide for use by Australian Government agencies that will assist them with proper semantic tagging of datasets;
  • identified Australian Government datasets that could benefit from proper semantic tagging;
  • and a case study on the process and any issues from of applying proper semantic tagging to an indentified agency dataset.

Both this and the fact that government departments such as the Australian Bureau of Statistics are moving to release data under a creative commons license is another encouraging sign that an open web of linked data is in the process of evolving.

PricewaterhouseCoopers forecast the Semantic Web

Sunday, June 7th, 2009

The freely available PricewaterhouseCoopers Spring 2009 Technology Forecast explains the value of the Semantic Web and Linked Data in the context of Enterprise applications, presenting interviews with leaders in the field and outlining how CIOs and individual departments can introduce Semantic Web technologies into their organizations.

Forecasts include:

  • “During the next three to five years, we forecast a transformation of the enterprise data management function driven by explicit engagement with data semantics” and
  • “PricewaterhouseCoopers believes a Web of data will develop that fully augments the document Web of today”.

W3C standards providing the foundation for this Web of data include URIs, RDF, RDF Schema (RDFS), the Web Ontology Language (OWL) and the Semantic Protocol and RDF Query Language (SPARQL).

URIs are more specific in a Semantic Web context than URLs, often including a hash that points to a thing such as an individual musician, a song of hers, or the label she records for within a page, rather than just the page itself.”

RDF takes the data elements identified by URIs and makes statements about the relationship of one element to another.”

Each statement is a triple, a subject-predicate-object combination.

Ontologies (based on RDFS and OWL) describe the characteristics of these RDF data elements and their relationships within specific domains, facilitating machine interpretability of the data content.

“In this universe of nouns and verbs, the verbs articulate the connections, or relationships, between nouns. Each noun then connects as a node in a networked structure, one that scales easily because of the simplicity and uniformity of its Web-like connections.”

The Web of data approach clearly benefits a company such as the British Broadcasting Corporation (BBC) which “links to URIs at DBpedia.org, a version of the structured information on Wikipedia, to enrich sites such as its music site (http://www.bbc.co.uk/music/)”. It also links MusicBrainz for information about artists and recording.

As described by Tom Scott of BBC Earth:

“The relationship between the BBC content, the DBpedia content, and MusicBrainz is no more than URIs. We just have links between these things, and we have an ontology that describes how this stuff maps together.”

Other reviews of the PricewaterhouseCoopers Spring 2009 Technology Forecast include:

Tom Scott has a presentation on Linking bbc.co.uk to the Linked Data cloud and the article  DBpedia Examples using Linked Data and Sparql provides a simple example of using SPARQL to query Dbpedia.

Logging in with FOAF+SSL

Friday, April 17th, 2009

“FOAF+SSL is an authentication and authorization protocol that links a Web ID to a public key, thereby enabling a global, decentralized/distributed, and open yet secure social network.”

In my case my FOAF file http://www.3kbo.com/people/richard.hancock/foaf.rdf#i is my Web ID.

A site using FOAF+SSL is Shout Box. Once a user has logged in to Shout Box and left a comment Shout Box displays the users Web ID along side their comment.

foaf-me-shout-box

A user logging in to Shout Box identifies themselves with a certificate stored in their browser. If a user has more than one certificate installed they can choose from the list of certificates presented by the browser certifcate manager (shown below for Firefox).

Selecting a certificate for a FOAF+SSL login is simpler and quicker than typing a user name and password.

The two obvious things are user needs in order to login with FOAF+SSL are:

FOAF+SSL also requires:

  • A reference to the Web ID from the certificate. This is provided by setting the Web ID as the value for  “X509v3 Subject Alternative Name”.
  • The public key of the certificate published in the Web ID (FOAF file).

If the key published in the Web ID matches that contained in the certificate then the server can conclude that the person logging in is the owner of the Web ID (FOAF file).

I can check the details of the certificate I have been using and see the reference to my Web ID by first opening the Firefox Certificate Manager (by pasting chrome://pippki/content/certManager.xul into the brower location bar). The Certificate Manager lists all the installed certificates.

To see more information about this certificate I select it then click “View …” to get a dialog box with two tabs “General” and “Details”.  Selecting the “Details” tab and “Certificate Subject Alt Name” shows that my Web ID, http://www.3kbo.com/people/richard.hancock/foaf.rdf#i is the value set for the “X509v3 Subject Alternative Name.

An easy way to create an X509 certificate with a reference to a Web ID is to follow the steps outlined in Henry Storys article creating a foaf+ssl cert in a few clicks. I used this process to create the other two certificates shown above.

I created my main X509 certificate by following the steps outlined by Henry in his earlier article foaf+ssl: a first implementation. This gives a good programmatic understanding of what’s happening.

( The code is under activate development so if you try it and have problems then check out revision 468 to get the code that matches the article i.e. svn checkout https://sommer.dev.java.net/svn/sommer/trunk sommer-r468 –username guest -r 468 )

Using this approach the main tasks for setting up a user with FOAF+SSL are:

  • Running GenerateKey to create an X509 certificate , setting an existing FOAF file as the Web ID.
  • Adding the RDF statements defining the public key to the FOAF file.
  • Adding the X509 certificate to the users browser.

GenerateKey generates the RDF statements defining the public key in N3 format. If your FOAF file is in RDF/XML format like mine then you need to convert from N3 to the RDF/XML.

Adding the following worked for me:

<rsa:RSAPublicKey>
<cert:identity rdf:resource="#i"/>
<rsa:public_exponent cert:decimal="65537"/>
<rsa:modulus cert:hex="d258d85da71a4f1199cae5e8e18a5ffa9127d9796526299b746de9fdcbc1364e074dc143d0ebbd3d3890d7e95b8b4931e3798a7a8f8dbd3441927b6601fb504ca2a919a803e31a6112fea227102dc1424946fb92f8f651f3da855ec43e496f8e0098b596f33af80e7b86d831d46948e040a656f3f00a67b724ccfb55fa4660d3" />
</rsa:RSAPublicKey>