Archive for the ‘Semantic Web’ Category

Developing a Semantic Web Strategy

Tuesday, August 10th, 2010

In the last chapter of his book “Pull: The Power of the Semantic Web to Transform Your Business” David Siegel outlines some steps for developing a successful Semantic Web strategy for your business or organization.

One approach that worked for me recently was to organize a meeting titled “Developing a Semantic Web Strategy”  and invite along developers, architects, analysts and managers. This was in the context of a government organization and the managers were from the applications development area.

Sharing out books like Semantic Web for the Working Ontologist, Semantic Web For Dummies, Programming the Semantic Web and Semantic Web Programming prior to the meeting helped people get familiar with concepts like URIs as names for things, RDF, RDFS, OWL, SPARQL and RDFa.

To highlight how rapidly the Web of Data is evolving and the amount of information now being published as Linked Open Data, I stepped through Mark Greaves excellent presentation The Maturing Semantic Web: Lessons in Web-Scale Knowledge Representation.

During the meeting I took a business strategy first, technology second approach, taking the time to explore how an approach that has worked for someone else might fit with our organization.

Areas explored included:

Enterprise Modeling

I spent some time comparing RDF / OWL modeling with the UML modeling, highlighting how URIs enable modeling across distributed information sources without the need to consolidate everything in a central repository like you do with UML tools.

Also touched on OWL features such as:

Because it is a government department I highlighted the Federal Enterprise Architecture Reference Model Ontology (FEA-RMO) and how such an ontology could be used to map a parliamentary initiative to the software providing its implementation.

Open Government

Given the current trend for governments to make datasets freely available I presented the Linked Data approaches taken by http://data.gov and http://data.gov.uk as examples to follow in this area.

The business case for Linked Data in this scenario is that Linked Data is seen as the best available approach for publishing data in hugely diverse and distributed environments, in a gradual and sustainable way (see Why Linked Data for data.gov.uk? for details).

RDFa Based Integration

One example that struck a chord was RDFa and Linked Data in UK Government Websites where job vacancy details  from different sites can easily be combined since each web site publishes their web pages using HTML with RDFa added to annotate the job vacancy. Using RDFa allows the same page to be read as either HTML or RDF. The end result is that integration can be achieved with minimal changes to the original sites.

Search Engine Optimisation (SEO)

For anyone advertising products and services online the business strategy to follow is the example set by BestBuy.com which describes its stores and products using the Good Relations ontology and embeds these descriptions into its web pages using RDFa, increasing search engine traffic by 30%.

Enterprise Web of Data

Within our software development process, from project inception to production release and subsequent maintenance release, information is being copied and duplicated in a number of different places. Silos abound, in the form of word documents, spread sheets and the sticky notes that are part of the “Agile” process. There is some good information on our wiki pages but it is unstructured and not machine readable.

The information that forms our internal processes fails David Siegel’s Semantic Web Acid Test:

  • It’s not semantic and
  • It’s not on the web.

Introducing a Semantic Wiki such as Semantic MediaWiki, to hold project information and link this information to other datasources was raised as a candidate for a semantic web proof of concept.

Outcomes

Just scheduling the meeting was in itself a successful outcome since it started discussion around the role Semantic Web technologies could play in our organization. For a number of people, including the Applications Development manager, this is new technology and they need time to absorb it but the end result was agreement that it was technology that couldn’t be ignored.

In order to gain some practical experience two internal prototypes were agreed to,  both with practical value for the organization.

The first is a small application that will show the full set of runtime dependencies for a given software component as well as the other components affected when the specified component is changed. The application will be based on a simple ontology that defines dependencies between components using the owl:TransitiveProperty and uses a reasoner (e.g. Pellet) to infer the full set of dependencies for a component.

The second prototype will trial Semantic MediaWiki for project management (potentially using the Teamwork Ontology). The longer term view is customize Semantic MediaWiki to include artifacts created as part of the software development process, addressing some of the silo problems found in our current internal enterprise web of data.

Once practical knowledge has been gained from the internal prototypes a meeting will be scheduled with the Enterprise Architecture team to canvas the establishment of a wider vision for the use of Linked Data and Semantic Web technologies, potentially leading to its use on the public web sites, actively publishing to the Web of Data.

Understanding the OpenCalais RDF Response

Saturday, September 26th, 2009

I’m using an XML version of an article published by Scoop in February 2000, Senior UN Officials Protest UN Sanctions On Iraq, to understand the OpenCalais RDF response as part of a larger project of linking extracted entities to existing Linked Data datasets.

OpenCalais uses natural language processing (NLP), machine learning and other methods to analyze content and return the entities it finds, such as the cities, countries and people with dereferenceable Linked Data style URIs. The entity types are defined in the OpenCalais RDF Schemas.

When I submit the content to the OpenCalais REST web service (using the default RDF response format) an RDF document is returned. Opened below with TopBraid Composer a portion of the input content and some of the entity types OpenCalais can detect is shown. The numbers in brackets indicate how many instances of an entity type have been detected, for example cle.Person(13) indicates that thirteen people have been detected.

The TopBraid Composer Instances tab contains the URIs of the people  detected. Opening the highlighted URI reveals that it is for a person named Saddam Hussein.

Entity Disambiguation

One of the challenges when analyzing content and extracting entities is entity disambiguation. Can the person named Saddam Hussein be uniquely identified. Usually the context is needed in order to disambiguate similar entities. As described in the OpenCalais FAQ if the “rdf:type rdf:resource” of a given entity contains /er/ the entity has been disambiguated by OpenCalais while if contains /em/ its not.

In the example above cle.Person is <http://s.opencalais.com/1/type/em/e/Person>. There is no obvious link to an “rdf:type rdf:resource” containing /er/. It looks like OpenCalais has been able to determined that the text “Saddam Hussein” equates to a Person, but has not been able to determine specifically who that person is.

In contrast Iraq ( one of three countries detected) is shown below with the Incoming Reference http://d.opencalais.com/er/geo/country/ralg-geo1/d3b1cee2-327c-fa35-7dab-f0289958c024.

Opening the URI http://d.opencalais.com/er/geo/country/ralg-geo1/d3b1cee2-327c-fa35-7dab-f0289958c024 with either an HTML browser as http://d.opencalais.com/er/geo/country/ralg-geo1/d3b1cee2-327c-fa35-7dab-f0289958c024.html or with an rdf browser as http://d.opencalais.com/er/geo/country/ralg-geo1/d3b1cee2-327c-fa35-7dab-f0289958c024.rdf ( in Tabulator below ) shows that the country has been disambiguated with <rdf:type rdf:resource=”http://s.opencalais.com/1/type/er/Geo/Country”/>.

Linked Data

In the RDF response returned by OpenCalais neither Iraq nor “Saddam Hussein” were linked to other Linked Data datasets. Some OpenCalais entities are. For example Moscow,Russia is linked via owl:sameAs to

Since I know that the context of the article is international news I can safely add some owl:sameAs links such as the following for Dbpedia links for “Saddam Hussein” (below) and Iraq.

Entity Relevance

For both detected entities “Saddam Hussein” and “Iraq” OpenCalais provides an entity relevance score (shown for each respectively in the screen shots below ) The relevance capability detects the importance of each unique entity and assigns a relevance score in the range 0-1 (1 being the most relevant and important). From the screen shots its clear that “Iraq” has been ranked more relevant.

Detection Information

The RDF Response includes the following properties relating to the subjects detection

  • c:docId: URI of the document this mention was detected in.
  • c:subject: URI of the unique entity.
  • c:detection: snippet of the input content where the metadata element was identified
  • c:prefix: snippet of the input content that precedes the current instance
  • c:exact: snippet of the input content in the matched portion of text
  • c:suffix: snippet of the input content that follows the current instance
  • c:offset: the character offset relative to the input content after it has been converted into XML
  • c:length: length of the instance.

The screen shot below for Saddam Hussein provides an example of how these properties work.

Conclusions

OpenCalais is a very impressive tool. It takes awhile though to fully understand the RDF response, especially in the areas of entity disambiguation and the linking of OpenCalais entities to other Linked Data datasets. Most likely there are some subtleties that I have missed or misunderstood so all clarifications welcome.

For entities extracted from international news sources and not linked to other Linked Data datasets it would be interesting to try some equivalence mining.

Australias Government 2.0 Taskforce commissions Semantic Web Project

Saturday, September 5th, 2009

The Australian Government initiated the Government 2.0 Taskforce in June 2009.

The launch video features Lindsay Tanner, Minister for Finance and Deregulation and chair Dr Nicholas Gruen in an enthusiastic presentation, outlining two key themes the government is keen for the taskforce to pursue.

These are:

  • Transparency and Openess. Using technology “to maximise the extent to which government information, data, and material can be put out into the public domain that we can be as accountable as possible, as transparent as possible and that this data is available for use in the general community.”
  • Community Engagement. Improving “the ways in which we engage with people in the wider community; in consultation, in discussion, in dialogue, about regulation, about government decisions, about policy generally.”

Examples of early government innovation include:

On 1 September 2009 the taskforce announced that it was Open for business commissioning six projects and inviting interested parties (individuals or companies) to submit quotes to be received by 9 September 2009.

Early leadership in Semantic Web

Of particular interest is the Early leadership in Semantic Web project. The project deliverable is to be a report which includes:

  • a guide for use by Australian Government agencies that will assist them with proper semantic tagging of datasets;
  • identified Australian Government datasets that could benefit from proper semantic tagging;
  • and a case study on the process and any issues from of applying proper semantic tagging to an indentified agency dataset.

Both this and the fact that government departments such as the Australian Bureau of Statistics are moving to release data under a creative commons license is another encouraging sign that an open web of linked data is in the process of evolving.

PricewaterhouseCoopers forecast the Semantic Web

Sunday, June 7th, 2009

The freely available PricewaterhouseCoopers Spring 2009 Technology Forecast explains the value of the Semantic Web and Linked Data in the context of Enterprise applications, presenting interviews with leaders in the field and outlining how CIOs and individual departments can introduce Semantic Web technologies into their organizations.

Forecasts include:

  • “During the next three to five years, we forecast a transformation of the enterprise data management function driven by explicit engagement with data semantics” and
  • “PricewaterhouseCoopers believes a Web of data will develop that fully augments the document Web of today”.

W3C standards providing the foundation for this Web of data include URIs, RDF, RDF Schema (RDFS), the Web Ontology Language (OWL) and the Semantic Protocol and RDF Query Language (SPARQL).

URIs are more specific in a Semantic Web context than URLs, often including a hash that points to a thing such as an individual musician, a song of hers, or the label she records for within a page, rather than just the page itself.”

RDF takes the data elements identified by URIs and makes statements about the relationship of one element to another.”

Each statement is a triple, a subject-predicate-object combination.

Ontologies (based on RDFS and OWL) describe the characteristics of these RDF data elements and their relationships within specific domains, facilitating machine interpretability of the data content.

“In this universe of nouns and verbs, the verbs articulate the connections, or relationships, between nouns. Each noun then connects as a node in a networked structure, one that scales easily because of the simplicity and uniformity of its Web-like connections.”

The Web of data approach clearly benefits a company such as the British Broadcasting Corporation (BBC) which “links to URIs at DBpedia.org, a version of the structured information on Wikipedia, to enrich sites such as its music site (http://www.bbc.co.uk/music/)”. It also links MusicBrainz for information about artists and recording.

As described by Tom Scott of BBC Earth:

“The relationship between the BBC content, the DBpedia content, and MusicBrainz is no more than URIs. We just have links between these things, and we have an ontology that describes how this stuff maps together.”

Other reviews of the PricewaterhouseCoopers Spring 2009 Technology Forecast include:

Tom Scott has a presentation on Linking bbc.co.uk to the Linked Data cloud and the article  DBpedia Examples using Linked Data and Sparql provides a simple example of using SPARQL to query Dbpedia.

Logging in with FOAF+SSL

Friday, April 17th, 2009

“FOAF+SSL is an authentication and authorization protocol that links a Web ID to a public key, thereby enabling a global, decentralized/distributed, and open yet secure social network.”

In my case my FOAF file http://www.3kbo.com/people/richard.hancock/foaf.rdf#i is my Web ID.

A site using FOAF+SSL is Shout Box. Once a user has logged in to Shout Box and left a comment Shout Box displays the users Web ID along side their comment.

foaf-me-shout-box

A user logging in to Shout Box identifies themselves with a certificate stored in their browser. If a user has more than one certificate installed they can choose from the list of certificates presented by the browser certifcate manager (shown below for Firefox).

Selecting a certificate for a FOAF+SSL login is simpler and quicker than typing a user name and password.

The two obvious things are user needs in order to login with FOAF+SSL are:

FOAF+SSL also requires:

  • A reference to the Web ID from the certificate. This is provided by setting the Web ID as the value for  “X509v3 Subject Alternative Name”.
  • The public key of the certificate published in the Web ID (FOAF file).

If the key published in the Web ID matches that contained in the certificate then the server can conclude that the person logging in is the owner of the Web ID (FOAF file).

I can check the details of the certificate I have been using and see the reference to my Web ID by first opening the Firefox Certificate Manager (by pasting chrome://pippki/content/certManager.xul into the brower location bar). The Certificate Manager lists all the installed certificates.

To see more information about this certificate I select it then click “View …” to get a dialog box with two tabs “General” and “Details”.  Selecting the “Details” tab and “Certificate Subject Alt Name” shows that my Web ID, http://www.3kbo.com/people/richard.hancock/foaf.rdf#i is the value set for the “X509v3 Subject Alternative Name.

An easy way to create an X509 certificate with a reference to a Web ID is to follow the steps outlined in Henry Storys article creating a foaf+ssl cert in a few clicks. I used this process to create the other two certificates shown above.

I created my main X509 certificate by following the steps outlined by Henry in his earlier article foaf+ssl: a first implementation. This gives a good programmatic understanding of what’s happening.

( The code is under activate development so if you try it and have problems then check out revision 468 to get the code that matches the article i.e. svn checkout https://sommer.dev.java.net/svn/sommer/trunk sommer-r468 –username guest -r 468 )

Using this approach the main tasks for setting up a user with FOAF+SSL are:

  • Running GenerateKey to create an X509 certificate , setting an existing FOAF file as the Web ID.
  • Adding the RDF statements defining the public key to the FOAF file.
  • Adding the X509 certificate to the users browser.

GenerateKey generates the RDF statements defining the public key in N3 format. If your FOAF file is in RDF/XML format like mine then you need to convert from N3 to the RDF/XML.

Adding the following worked for me:

<rsa:RSAPublicKey>
<cert:identity rdf:resource="#i"/>
<rsa:public_exponent cert:decimal="65537"/>
<rsa:modulus cert:hex="d258d85da71a4f1199cae5e8e18a5ffa9127d9796526299b746de9fdcbc1364e074dc143d0ebbd3d3890d7e95b8b4931e3798a7a8f8dbd3441927b6601fb504ca2a919a803e31a6112fea227102dc1424946fb92f8f651f3da855ec43e496f8e0098b596f33af80e7b86d831d46948e040a656f3f00a67b724ccfb55fa4660d3" />
</rsa:RSAPublicKey>