Posts Tagged ‘RDF’

Linked Data and the SOA Software Development Process

Thursday, November 17th, 2011

We have quite a rigorous SOA software development process however the full value of the collected information is not being realized because the artifacts are stored in disconnected information silos. So far attempts to introduce tools which could improve the situation (e.g. zAgile Teamwork and Semantic Media Wiki) have been unsuccessful, possibly because the value of a Linked Data approach is not yet fully appreciated.

To provide an example Linked Data view of the SOA services and their associated artifacts I created a prototype consisting of  Sesame running on a Tomcat server with Pubby providing the Linked Data view via the Sesame SPARQL end point. TopBraid was connected directly to the Sesame native store (configured via the Sesame Workbench) to create a subset of services sufficient to demonstrate the value of publishing information as Linked Data. In particular the prototype showed how easy it became to navigate from the requirements for a SOA service through to details of its implementation.

The  prototype also highlighted that auto generation of the RDF graph (the data providing the Linked Data view) from the actual source artifacts would be preferable to manual entry, especially if this could be transparently integrated with the current software development process. This is has become the focus of the next step, automated knowledge extraction from the source artifacts.

Artifacts

Key artifact types of our process include:

A Graph of Concepts and Instances

There is a rich graph of relationships linking the things described in the artifacts listed above. For example the business entities defined in the UML analysis model are the subject of the service and service operations defined in the Service Contracts. The service and service operations are mapped to the WSDLs which utilize the Xml Schema’s that provide an XML view of business entities. The JAX-WS implementations are linked to the WSDLs and Xml Schema’s and deployed to the Oracle Weblogic Application Server where the configuration files list the external dependencies. The log files and defects link back to specific parts of the code base (Subversion revisions) within the context of specific service operations. The people associated with the different artifacts can often be determined from artifact meta-data.

RDF, OWL and Linked Data are a natural fit for modelling and viewing this graph since there is a mix of concepts plus a lot of instances, many of whom already have a HTTP representation. Also the graph contains a number of transitive relationships , (for example a WSDL may import an Xml Schema which in turn imports another Xml Schema etc …) promoting the use of the owl:TransitiveProperty to help obtain a full picture of all the dependencies a component may have.

Knowledge Extraction

Another advantage of the RDF, OWL, Linked Data approach is the utilization of unique URIs for identifying concepts and instances. This allows information contain in one artifact, e.g. a WSDL, to be extracted as RDF triples which would later be combined with the RDF triples extracted from the JAX-WS annotation of Java source code. The combined RDF triples tell us more about the WSDL and its Java implementation than could be derived from just one of the artifacts.

We have made some progress with knowledge extraction but this is still definitely a work in progress. Sites such as ConverterToRdf, RDFizers and the Virtuoso Sponger provide tools and information on generating RDF from different artifact types. Part of the current experimentation is around finding tools that can be transparently layered over the top of the current software development process. Finding the best way to extract the full set of desired RDF triples from Microsoft Word documents is also proving problematic since some natural language processing is required.

Tools currently being evaluated include:

The Benefits of Linked Data

The prototype showed the benefits of Linked Data for navigating from the requirements for a SOA service through to details of its implementation. Looking at all the information that could be extracted leads on to a broader view of the benefits Linked Data would bring to the SOA software development process.

One specific use being planned is the creation of a Service Registry application providing the following functionality:

  • Linking the services to the implementations running in a given environment, e.g. dev, test and production. This includes linking the specific versions of the requirement, design or implementation artifacts and detailing the runtime dependencies of each service implementation.
  • Listing the consumers of each service and providing summary statistics on the performance, e.g. daily usage figures derived from audit logs.
  • Providing a list of who to contact when a service is not available. This includes notifying consumers of a service outage and also contacting providers if a service is being affected by an external component being offline, e.g. a database or an external web service.
  • Search of the services by different criteria, e.g. business entity
  • Tracking the evolution of services and being able to assist with refactoring, e.g answering questions such as “Are there older versions of the Xml Schemas that can be deprecated?”
  • Simplify the running of a specific Soapui test case for a service operation in a given environment.
  • Provide the equivalent of a class lookup that includes all project classes plus all required infrastructure classes and returns information such as the jar file the class is contained in and JIRA and Subversion information.

Drupal7 RDFa XMLLiteral content processing

Saturday, March 12th, 2011

Drupal 7 supports RDFa 1.0 as part of the core product. RDFa 1.0 is the current specification but RDFa 1.1 is to be released shortly.

RDFa 1.0 metadata can be parsed using the RDFa Distiller and Parser while the RDFa Distiller and Parser (Test Version for RDFa 1.1) can be used to extract RDFa 1.1.

Creating a simple Drupal 7 test blog and parsing out the RDFa 1.0 metadata with the RDFa Distiller and Parser shows that Drupal 7 is using the SIOC (Semantically-Interlinked Online Communities) ontology to describe blog posts and identifies the Drupal user as the creator of the post using the sioc:has_creator property.

<sioc:Post rdf:about="http://137breakerbay.3kbo.com/test">
  <rdf:type rdf:resource="http://rdfs.org/sioc/types#BlogPost"/>
  ...
  <sioc:has_creator>
    <sioc:UserAccount rdf:about="http://137breakerbay.3kbo.com/user/2">
      <foaf:name>Richard</foaf:name>
    </sioc:UserAccount>
  </sioc:has_creator>
  ...
  <content:encoded rdf:parseType="Literal"><p xml:lang="en" xmlns="http://www.w3.org/1999/xhtml">Test Blog</p>
  </content:encoded>
</sioc:Post>

The sioc:UserAccount is a sub class of foaf:OnlineAccount.

What I would like to do is add additional RDFa metadata within the content of the blog to associate me, the Drupal 7 user with a sioc:UserAccount, to me the foaf:Person identified by my FOAF file.

Drupal 7 content is wrapped by XHTML elements containing the property=”content:encoded” (shown below) and an RDFa parser treats this content as an XMLLiteral.

<div property="content:encoded">
...
</div>

The problem is that RDFa 1.0 parsers don’t extract metadata contained within the XMLLiteral.

This was raised in the issue “XMLLiteral content isn’t processed for RDFa attributes in RDFa 1.0 – should this change in RDFa 1.1? a while back with the result that in RDFa 1.1 parsers should now also process the XMLLiteral content.

To make sure that the RDFa parsers know that I want to use RDFa 1.1 processing I need to update Drupal 7 to use the  XHTML+RDFa Driver Module defined in the XHTML+RDFa 1.1 spec.

This turns out to be a simple update of one Drupal 7 file, site/modules/system/html.tpl.php.

Near the top of the file the version is changed to 1.1 (in two places) and the dtd changed to  “http://www.w3.org/MarkUp/DTD/xhtml-rdfa-2.dtd”.

?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.1//EN"
  "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-2.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="<?php print $language->language; ?>" version="XHTML+RDFa 1.1" dir="<?php print $language->dir; ?>"<?php print $rdf_namespaces; ?>>

With these changes made I can create another blog containing the following RDFa metadata

<div about="http://www.3kbo.com/people/richard.hancock/foaf.rdf#i" typeof="foaf:Person">
<div rel="foaf:account" resource="http://137breakerbay.3kbo.com/user/2">
...
<div>
</div>

knowing that that an RDFa 1.1 parser will create the RDF triples below which link the Drupal 7 user to me the person identified in my FOAF file.

  <foaf:Person rdf:about="http://www.3kbo.com/people/richard.hancock/foaf.rdf#i">
    <foaf:account>
      <sioc:UserAccount rdf:about="http://137breakerbay.3kbo.com/user/2">
        <foaf:name>Richard</foaf:name>
      </sioc:UserAccount>
    </foaf:account>
  </foaf:Person>

The differences between the RDF extracted with an RDFa 1.0 parser and an RDFa 1.1 parser can be seen using the two links below.

Now that I know that the RDFa 1.1 metadata embedded in the content will be processed accordingly I can move on to the task of building 137 Breaker Bay, a simple accommodation site where the plan is to use RDFa and ontologies such as GoodRelations to describe the both the accommodation services available and the attractions and services of the surrounding area.

Understanding the OpenCalais RDF Response

Saturday, September 26th, 2009

I’m using an XML version of an article published by Scoop in February 2000, Senior UN Officials Protest UN Sanctions On Iraq, to understand the OpenCalais RDF response as part of a larger project of linking extracted entities to existing Linked Data datasets.

OpenCalais uses natural language processing (NLP), machine learning and other methods to analyze content and return the entities it finds, such as the cities, countries and people with dereferenceable Linked Data style URIs. The entity types are defined in the OpenCalais RDF Schemas.

When I submit the content to the OpenCalais REST web service (using the default RDF response format) an RDF document is returned. Opened below with TopBraid Composer a portion of the input content and some of the entity types OpenCalais can detect is shown. The numbers in brackets indicate how many instances of an entity type have been detected, for example cle.Person(13) indicates that thirteen people have been detected.

The TopBraid Composer Instances tab contains the URIs of the people  detected. Opening the highlighted URI reveals that it is for a person named Saddam Hussein.

Entity Disambiguation

One of the challenges when analyzing content and extracting entities is entity disambiguation. Can the person named Saddam Hussein be uniquely identified. Usually the context is needed in order to disambiguate similar entities. As described in the OpenCalais FAQ if the “rdf:type rdf:resource” of a given entity contains /er/ the entity has been disambiguated by OpenCalais while if contains /em/ its not.

In the example above cle.Person is <http://s.opencalais.com/1/type/em/e/Person>. There is no obvious link to an “rdf:type rdf:resource” containing /er/. It looks like OpenCalais has been able to determined that the text “Saddam Hussein” equates to a Person, but has not been able to determine specifically who that person is.

In contrast Iraq ( one of three countries detected) is shown below with the Incoming Reference http://d.opencalais.com/er/geo/country/ralg-geo1/d3b1cee2-327c-fa35-7dab-f0289958c024.

Opening the URI http://d.opencalais.com/er/geo/country/ralg-geo1/d3b1cee2-327c-fa35-7dab-f0289958c024 with either an HTML browser as http://d.opencalais.com/er/geo/country/ralg-geo1/d3b1cee2-327c-fa35-7dab-f0289958c024.html or with an rdf browser as http://d.opencalais.com/er/geo/country/ralg-geo1/d3b1cee2-327c-fa35-7dab-f0289958c024.rdf ( in Tabulator below ) shows that the country has been disambiguated with <rdf:type rdf:resource=”http://s.opencalais.com/1/type/er/Geo/Country”/>.

Linked Data

In the RDF response returned by OpenCalais neither Iraq nor “Saddam Hussein” were linked to other Linked Data datasets. Some OpenCalais entities are. For example Moscow,Russia is linked via owl:sameAs to

Since I know that the context of the article is international news I can safely add some owl:sameAs links such as the following for Dbpedia links for “Saddam Hussein” (below) and Iraq.

Entity Relevance

For both detected entities “Saddam Hussein” and “Iraq” OpenCalais provides an entity relevance score (shown for each respectively in the screen shots below ) The relevance capability detects the importance of each unique entity and assigns a relevance score in the range 0-1 (1 being the most relevant and important). From the screen shots its clear that “Iraq” has been ranked more relevant.

Detection Information

The RDF Response includes the following properties relating to the subjects detection

  • c:docId: URI of the document this mention was detected in.
  • c:subject: URI of the unique entity.
  • c:detection: snippet of the input content where the metadata element was identified
  • c:prefix: snippet of the input content that precedes the current instance
  • c:exact: snippet of the input content in the matched portion of text
  • c:suffix: snippet of the input content that follows the current instance
  • c:offset: the character offset relative to the input content after it has been converted into XML
  • c:length: length of the instance.

The screen shot below for Saddam Hussein provides an example of how these properties work.

Conclusions

OpenCalais is a very impressive tool. It takes awhile though to fully understand the RDF response, especially in the areas of entity disambiguation and the linking of OpenCalais entities to other Linked Data datasets. Most likely there are some subtleties that I have missed or misunderstood so all clarifications welcome.

For entities extracted from international news sources and not linked to other Linked Data datasets it would be interesting to try some equivalence mining.

DBpedia Examples using Linked Data and Sparql

Monday, August 11th, 2008

Using Wikipedia, the largest online encyclopedia, users can browse and perform full-text searches, but programmatic access to the knowledge-base is limited.

The DBpedia project extracts structured information from Wikipedia opening it up to programmatic access using Semantic Web technologies such as Linked Data and SPARQL. This means that the linking and reasoning abilities of RDF and OWL can be utilized and queries for specific information can be made using SPARQL.

Simplistically the mapping from the Wikipedia HTML based web pages to the DBpedia RDF based resources can be thought of as replacing “http://en.wikipedia.org/wiki/” with “http://dbpedia.org/resource/” but in reality there are some additional subtleties which are described in the article From Wikipedia URI-s to DBpedia URI.

The Wikipedia entry for “Civil Engineering” (http://en.wikipedia.org/wiki/Civil_Engineering) is used as an example to show how specific data can be retrieved from its DBpedia equivalent (http://dbpedia.org/resource/Civil_engineering).

When both the Wikipedia entry (http://en.wikipedia.org/wiki/Civil_Engineering) and its DBpedia equivalent (http://dbpedia.org/resource/Civil_engineering) are opened in a standard web browser they display similar information, however the DBpedia equivalent has been redirected to http://dbpedia.org/page/Civil_engineering.

This redirect can be viewed in Firefox using the Tamper Data Firefox Extension as shown in the image below.

Loading the DBpedia Resource

The initial status of 303 is the HTTP response code “303 See Other“. The server replied with the HTTP response code 303 in order to direct the browser to URI http://dbpedia.org/page/Civil_engineering which is a HTML page the browser can display. The original URI http://dbpedia.org/resource/Civil_engineering is an RDF resource that would not display as well in the HTML browser.

DBpedia implements a HTTP mechanism called content negotiation in order to provide clients such as web browsers with the information they request in a form they can display. The tutorial How to publish Linked Data on the Web describe this and other Linked Data techniques that are used by applications such as DBpedia.

In order to access the RDF resource directly a web client needs to tell the server to send it RDF data. A client can do this by sending the HTTP Request Header Accept: application/rdf+xml as part of its initial request. (The HTML browser had sent an Accept: text/html HTTP header indicating that it was requesting an HTML page.)

The Firefox Addon RESTTest can be used to set Accept: application/rdf+xml in the HTTP Request Header and directly request http://dbpedia.org/resource/Civil_engineering as shown in the image below.

In this case the request to http://dbpedia.org/resource/Civil_engineering succeeded as shown by the “Response Status 200″ and a RDF document was received as shown in the “Response Text”.

In both the RDF fragment shown in the image above and in the HTML page http://dbpedia.org/page/Civil_engineering the multiple language support is visible. The SPARQL queries below show how to extract specific information for a particular language.

SPARQL

DBpedia provides a public SPARQL endpoint at http://dbpedia.org/sparql which enables users to query the RDF datasource with SPARQL queries such as the following.

SELECT ?abstract
WHERE {
{ <http://dbpedia.org/resource/Civil_engineering> <http://dbpedia.org/ontology/abstract> ?abstract }
}

The query returns all the abstracts for Civil Engineering, in each of the available languages.

The next query refines the abstracts returned to just the language specified, in this case ‘en’ (English).

SELECT ?abstract
WHERE {
{ <http://dbpedia.org/resource/Civil_engineering> <http://dbpedia.org/ontology/abstract> ?abstract .
FILTER langMatches( lang(?abstract), ‘en’) }
}

The SNORQL query explorer shown in the image below, provides a simpler interface to the DBpedia SPARQL endpoint. The image below shows both the query and the result returned.

Other SPARQL endpoints such as http://demo.openlinksw.com/sparql/ (shown below) can query DBpedia by specifying the FROM NAMED clause to describe the RDF dataset. E.g.

SELECT ?abstract
FROM NAMED <http://dbpedia.org>
WHERE {
{ <http://dbpedia.org/resource/Civil_engineering> <http://dbpedia.org/ontology/abstract> ?abstract.
FILTER langMatches( lang(?abstract), ‘en’) }
}

Other Related DBpedia Articles

RDF as self-describing Data uses DBpedia and its SPARQL support to show how RDF is essentially ’self-describing’ – there is no need to know about traditional metadata (schemas) before exploring a data set.

Linking to DBpedia with TopBraid outlines the benefit of DBpedia in terms of providing relatively stable URIs for all relevant real-world concepts, thus making it a natural place to connect specific domain models with each other using the OWL built in propery owl:sameAs ( This property indicates that two URI references actually refer to the same thing ). TopBraid Composer provides support to link domain models with DBpedia .

Querying DBpedia provides examples of using SPARQL to query DBpedia.

Adding Semantic Markup to Your Rails Application with DBpedia and ActiveRDF and
Get Semantic with DBPedia and ActiveRDF describe using ActiveRDF to integrate DBpedia resources into web based applications. ActiveRDF is a library for accessing RDF data from Ruby and Ruby On Rails programs and can perform SPARQL queries.

Constructing an Ontology – Common Inspection and Test Plans

Sunday, June 15th, 2008

ABE Services has developed a web based application, the Compliance Data Management Service (CDMS) for checking work performed on site as part of building and construction projects. These are projects undertaken by the building, construction and related industries.

The diagram below gives an overview of how CDMS works.

The three main parts of CDMS are:

  • Designing Projects by identifying the tasks to be performed and allocating those tasks to the people who will perform them.
  • Using a mobile phone on site to check that completed tasks comply with industry standards and best practice
  • Monitoring and Managing the project via the progress and status of the completed tasks, the tasks outstanding and the non-compliant tasks.

Inspection and Test Plans (ITPs) are central to the three main parts of CDMS.

An Inspection and Test Plan (ITP) identifies the inspection, testing (verification) and acceptance requirements of a particular type of task, eg brickwork. Conceptually an ITP could be thought of as a refined checklist, identifying the usual steps to be followed when undertaking a particular type of task. A specific task may also have additional requirements specific just to it.

An Inspection and Test Plan is composed of one or more:

  • Verification Point(s) which identify the parts of the task to verify. Each Verification Point is composed of one or more:
  • Criterion (plural Criteria) which define the specific requirements by which a verification point can be deemed compliant. This could include referencing specific industry standards which the task is required to comply with.

A set of commonly recurring Inspection and Test Plans (ITPs) has been identified and published on the ABE Services web site. To help promote a higher standard of work in the building and construction industries ABE Services decided to make these common Inspection and Test Plans (ITPs) freely and publicly available. Currently though the details of these common Inspection and Test Plans (ITPs) are hidden within the CDMS application. To place them in the public domain an OWL ontology, named the Common Inspection and Test Plans ontology, is being constructed based on this initial set of commonly recurring Inspection and Test Plans (ITPs).

The definition of ontology in this context is along the lines of “an ontology is a specification of a conceptualization“. In this case the specification is that which constitutes a generic Inspection and Test Plan and the more specialized Inspection and Test Plans listed in the set of common Inspection and Test Plans (ITPs). Other related definitions of ontology include the more philosophical “the study of the nature of being” and the more detailed Wikipedia definitions Ontology and Ontology (Information Science)

The Common Inspection and Test Plans ontology is being implemented using the Web Ontology Language (OWL). OWL is an ontology language that can formally describe the meaning of terminology used in Semantic Web documents (See Why OWL?). In turn the Semantic Web is about two things. It is about common formats for integration and combination of data drawn from diverse sources, where on the original Web mainly concentrated on the interchange of documents. It is also about language for recording how the data relates to real world objects. That allows a person, or a machine, to start off in one database, and then move through an unending set of databases which are connected not by wires but by being about the same thing.

In the world of building and construction projects, and activities related to Real Estate, the buying and selling of houses, house and building maintenance, and potentially in the selection of contractors to perform building work, the Common Inspection and Test Plans ontology will form one of these databases, connected by the concept of how to perform a specific type of building, construction or related task.

In the first instance the Common Inspection and Test Plans ontology has been published in a very simple draft form, listing only the Inspection and Test Plans and not the more detailed the Verification Points and Criteria.

This draft version is available at the URI: http://www.abeservices.com.au/schema/2008/05/InspectionTestPlans.rdf.

A good way to view it is to install the Tabulator rdf browser plugin for Firefox as outlined in Description of a Project

It is intended that the Common Inspection and Test Plans ontology evolve as a result of community participation. As well as general feedback it is hoped to also make available a web-based vocabulary editor which would allow for greater collaboration . (Potential options include using Neologism and OntoWiki).

The next steps are to

  • add Verification Point(s) and Criteria to the currently identified Inspection and Test Plans
  • identify additional concepts related to Inspection and Test Plans, e.g. Hold Points
  •  add some worked examples

In a future article I’ll also outline how to query the Common Inspection and Test Plans ontology for specific informaton using the RDF query language SPARQL. A SPARQL end point is currently available at http://abeserver.isa.net.au:2020/ providing a SPARQL Query form .