Archive for the ‘Linked Data’ Category

Developing a Semantic Web Strategy

Tuesday, August 10th, 2010

In the last chapter of his book “Pull: The Power of the Semantic Web to Transform Your Business” David Siegel outlines some steps for developing a successful Semantic Web strategy for your business or organization.

One approach that worked for me recently was to organize a meeting titled “Developing a Semantic Web Strategy”  and invite along developers, architects, analysts and managers. This was in the context of a government organization and the managers were from the applications development area.

Sharing out books like Semantic Web for the Working Ontologist, Semantic Web For Dummies, Programming the Semantic Web and Semantic Web Programming prior to the meeting helped people get familiar with concepts like URIs as names for things, RDF, RDFS, OWL, SPARQL and RDFa.

To highlight how rapidly the Web of Data is evolving and the amount of information now being published as Linked Open Data, I stepped through Mark Greaves excellent presentation The Maturing Semantic Web: Lessons in Web-Scale Knowledge Representation.

During the meeting I took a business strategy first, technology second approach, taking the time to explore how an approach that has worked for someone else might fit with our organization.

Areas explored included:

Enterprise Modeling

I spent some time comparing RDF / OWL modeling with the UML modeling, highlighting how URIs enable modeling across distributed information sources without the need to consolidate everything in a central repository like you do with UML tools.

Also touched on OWL features such as:

Because it is a government department I highlighted the Federal Enterprise Architecture Reference Model Ontology (FEA-RMO) and how such an ontology could be used to map a parliamentary initiative to the software providing its implementation.

Open Government

Given the current trend for governments to make datasets freely available I presented the Linked Data approaches taken by http://data.gov and http://data.gov.uk as examples to follow in this area.

The business case for Linked Data in this scenario is that Linked Data is seen as the best available approach for publishing data in hugely diverse and distributed environments, in a gradual and sustainable way (see Why Linked Data for data.gov.uk? for details).

RDFa Based Integration

One example that struck a chord was RDFa and Linked Data in UK Government Websites where job vacancy details  from different sites can easily be combined since each web site publishes their web pages using HTML with RDFa added to annotate the job vacancy. Using RDFa allows the same page to be read as either HTML or RDF. The end result is that integration can be achieved with minimal changes to the original sites.

Search Engine Optimisation (SEO)

For anyone advertising products and services online the business strategy to follow is the example set by BestBuy.com which describes its stores and products using the Good Relations ontology and embeds these descriptions into its web pages using RDFa, increasing search engine traffic by 30%.

Enterprise Web of Data

Within our software development process, from project inception to production release and subsequent maintenance release, information is being copied and duplicated in a number of different places. Silos abound, in the form of word documents, spread sheets and the sticky notes that are part of the “Agile” process. There is some good information on our wiki pages but it is unstructured and not machine readable.

The information that forms our internal processes fails David Siegel’s Semantic Web Acid Test:

  • It’s not semantic and
  • It’s not on the web.

Introducing a Semantic Wiki such as Semantic MediaWiki, to hold project information and link this information to other datasources was raised as a candidate for a semantic web proof of concept.

Outcomes

Just scheduling the meeting was in itself a successful outcome since it started discussion around the role Semantic Web technologies could play in our organization. For a number of people, including the Applications Development manager, this is new technology and they need time to absorb it but the end result was agreement that it was technology that couldn’t be ignored.

In order to gain some practical experience two internal prototypes were agreed to,  both with practical value for the organization.

The first is a small application that will show the full set of runtime dependencies for a given software component as well as the other components affected when the specified component is changed. The application will be based on a simple ontology that defines dependencies between components using the owl:TransitiveProperty and uses a reasoner (e.g. Pellet) to infer the full set of dependencies for a component.

The second prototype will trial Semantic MediaWiki for project management (potentially using the Teamwork Ontology). The longer term view is customize Semantic MediaWiki to include artifacts created as part of the software development process, addressing some of the silo problems found in our current internal enterprise web of data.

Once practical knowledge has been gained from the internal prototypes a meeting will be scheduled with the Enterprise Architecture team to canvas the establishment of a wider vision for the use of Linked Data and Semantic Web technologies, potentially leading to its use on the public web sites, actively publishing to the Web of Data.

Understanding the OpenCalais RDF Response

Saturday, September 26th, 2009

I’m using an XML version of an article published by Scoop in February 2000, Senior UN Officials Protest UN Sanctions On Iraq, to understand the OpenCalais RDF response as part of a larger project of linking extracted entities to existing Linked Data datasets.

OpenCalais uses natural language processing (NLP), machine learning and other methods to analyze content and return the entities it finds, such as the cities, countries and people with dereferenceable Linked Data style URIs. The entity types are defined in the OpenCalais RDF Schemas.

When I submit the content to the OpenCalais REST web service (using the default RDF response format) an RDF document is returned. Opened below with TopBraid Composer a portion of the input content and some of the entity types OpenCalais can detect is shown. The numbers in brackets indicate how many instances of an entity type have been detected, for example cle.Person(13) indicates that thirteen people have been detected.

The TopBraid Composer Instances tab contains the URIs of the people  detected. Opening the highlighted URI reveals that it is for a person named Saddam Hussein.

Entity Disambiguation

One of the challenges when analyzing content and extracting entities is entity disambiguation. Can the person named Saddam Hussein be uniquely identified. Usually the context is needed in order to disambiguate similar entities. As described in the OpenCalais FAQ if the “rdf:type rdf:resource” of a given entity contains /er/ the entity has been disambiguated by OpenCalais while if contains /em/ its not.

In the example above cle.Person is <http://s.opencalais.com/1/type/em/e/Person>. There is no obvious link to an “rdf:type rdf:resource” containing /er/. It looks like OpenCalais has been able to determined that the text “Saddam Hussein” equates to a Person, but has not been able to determine specifically who that person is.

In contrast Iraq ( one of three countries detected) is shown below with the Incoming Reference http://d.opencalais.com/er/geo/country/ralg-geo1/d3b1cee2-327c-fa35-7dab-f0289958c024.

Opening the URI http://d.opencalais.com/er/geo/country/ralg-geo1/d3b1cee2-327c-fa35-7dab-f0289958c024 with either an HTML browser as http://d.opencalais.com/er/geo/country/ralg-geo1/d3b1cee2-327c-fa35-7dab-f0289958c024.html or with an rdf browser as http://d.opencalais.com/er/geo/country/ralg-geo1/d3b1cee2-327c-fa35-7dab-f0289958c024.rdf ( in Tabulator below ) shows that the country has been disambiguated with <rdf:type rdf:resource=”http://s.opencalais.com/1/type/er/Geo/Country”/>.

Linked Data

In the RDF response returned by OpenCalais neither Iraq nor “Saddam Hussein” were linked to other Linked Data datasets. Some OpenCalais entities are. For example Moscow,Russia is linked via owl:sameAs to

Since I know that the context of the article is international news I can safely add some owl:sameAs links such as the following for Dbpedia links for “Saddam Hussein” (below) and Iraq.

Entity Relevance

For both detected entities “Saddam Hussein” and “Iraq” OpenCalais provides an entity relevance score (shown for each respectively in the screen shots below ) The relevance capability detects the importance of each unique entity and assigns a relevance score in the range 0-1 (1 being the most relevant and important). From the screen shots its clear that “Iraq” has been ranked more relevant.

Detection Information

The RDF Response includes the following properties relating to the subjects detection

  • c:docId: URI of the document this mention was detected in.
  • c:subject: URI of the unique entity.
  • c:detection: snippet of the input content where the metadata element was identified
  • c:prefix: snippet of the input content that precedes the current instance
  • c:exact: snippet of the input content in the matched portion of text
  • c:suffix: snippet of the input content that follows the current instance
  • c:offset: the character offset relative to the input content after it has been converted into XML
  • c:length: length of the instance.

The screen shot below for Saddam Hussein provides an example of how these properties work.

Conclusions

OpenCalais is a very impressive tool. It takes awhile though to fully understand the RDF response, especially in the areas of entity disambiguation and the linking of OpenCalais entities to other Linked Data datasets. Most likely there are some subtleties that I have missed or misunderstood so all clarifications welcome.

For entities extracted from international news sources and not linked to other Linked Data datasets it would be interesting to try some equivalence mining.

Australias Government 2.0 Taskforce commissions Semantic Web Project

Saturday, September 5th, 2009

The Australian Government initiated the Government 2.0 Taskforce in June 2009.

The launch video features Lindsay Tanner, Minister for Finance and Deregulation and chair Dr Nicholas Gruen in an enthusiastic presentation, outlining two key themes the government is keen for the taskforce to pursue.

These are:

  • Transparency and Openess. Using technology “to maximise the extent to which government information, data, and material can be put out into the public domain that we can be as accountable as possible, as transparent as possible and that this data is available for use in the general community.”
  • Community Engagement. Improving “the ways in which we engage with people in the wider community; in consultation, in discussion, in dialogue, about regulation, about government decisions, about policy generally.”

Examples of early government innovation include:

On 1 September 2009 the taskforce announced that it was Open for business commissioning six projects and inviting interested parties (individuals or companies) to submit quotes to be received by 9 September 2009.

Early leadership in Semantic Web

Of particular interest is the Early leadership in Semantic Web project. The project deliverable is to be a report which includes:

  • a guide for use by Australian Government agencies that will assist them with proper semantic tagging of datasets;
  • identified Australian Government datasets that could benefit from proper semantic tagging;
  • and a case study on the process and any issues from of applying proper semantic tagging to an indentified agency dataset.

Both this and the fact that government departments such as the Australian Bureau of Statistics are moving to release data under a creative commons license is another encouraging sign that an open web of linked data is in the process of evolving.

A GoodRelations Semantic Web Description of a Business

Saturday, April 11th, 2009

Tried out the newly released GoodRelations Annotator to create a Semantic Web description of a business.

The GoodRelations Annotator is an online form-based tool that creates an RDF/XML file “semanticweb.rdf” containing a description of the key aspects of the business. The description is based on concepts defined in the GoodRelations OWL ontology. In particular the description contains a BusinessEntity representing the business and one or more Offerings. Each Offering describes the intent to provide a Business Function for a certain Product or Service to a specified target audience.

The generated RDF/XML file can be either be published directly on the company’s Web site or used as a skeleton for developing a more fine-grained description.

The link Publishing GoodRelations Data on the Web provides guidelines on publishing to the web.

In my case I created a description for my embryonic business 3kbo.

I’m interested in linking the generated semanticweb.rdf to other things, in particular linking the BusinessEntity with people and with other BusinessEntitys.

Initially I added the URI of my foaf file to the BusinessEntity instance using rdfs:seeAlso, but after reading the definition of BusinessEntity i.e. that it represents the legal agent making a particular offering and
can be a legal body or a person, I changed it to owl:sameAs.

E.g.

<gr:BusinessEntity rdf:ID=”BusinessEntity”>

<owl:sameAs
rdf:resource=”http://www.3kbo.com/people/richard.hancock/foaf.rdf#i“/>

</gr:BusinessEntity>

This makes sense for my simple case, since as a sole trader I am the BusinessEntity. When viewed in Firefox using the Tabulator Extension owl:sameAs also provides an inferred link from my foaf file to my semanticweb.rdf as shown below.

foaf-infers-goodrelations

A part of the business description I don’t understand yet is how best to use the eClassOWL ontology to describe the Product or Service.

For example using the GoodRelations Annotator I selected “19 information, communication and media technology” as the Category and “1904 Software” as the Group.

eClassProductCategory

This leads to http://www.ebusiness-unibw.org/ontologies/eclass/5.1.4/#C_AKJ317003-tax being used in the definition of the product or service, i.e.

<gr:typeOfGood>
<gr:ProductOrServicesSomeInstancesPlaceholder rdf:ID=“ProductOrServicesSomeInstancesPlaceholder_1″>
<rdf:type rdf:resource=”"&eco;#C_AKJ317003-tax”>

<gr:ProductOrServicesSomeInstancesPlaceholder>
<gr:typeOfGood>

Because of the size of the eClassOWL ontology it takes awhile to dereference this link. It would be good to be able to provide a  more user friendly reference at this point that provided a description of the product or service.

Beyond this simple example I am interested in semantic web descriptions of other more complex relationships between a BusinessEntity (when not a person) and the people involved with the business (e.g. directors, CEO etc …) and between other BusinessEntitys.

Potentially GoodRelations and eClassOWL could be used as part of an Enterprise Architecture describing the who, what, how, when, where and why of a business.

Why Migrate to the Semantic Web?

Saturday, November 8th, 2008

Why Migrate to the Semantic Web? has just been published at Devx.com.

It pretty much summarizes my reasons migrating the CDMS application to the semantic web.

What it doesn’t describe in detail is that for building compliance at a specific locality it is the local legislation that takes precedence. This means that Linked Data from sources such as Dbpedia is great for describing concepts but at a local level you need to refer to Linked Data derived from local legislation to explicitly clarify the criteria that forms the basis of compliance.