Archive for the ‘RDF’ Category

Using Linked Data to provide a different perspective on Software Architecture

Saturday, December 17th, 2011

As outlined in my previous post Linked Data and the SOA Software Development Process I am interested in using Linked Data to provided a more detailed view of  SOA services.

A coupled of scenarios during the past week highlighted the value of the approach and also that it would benefit with extending the scope to include more information about the consumers of the SOA services and also the external data sources (in particular databases) used by the SOA services.

Both scenarios involved setting up environments for the development and testing of new functionality involving a number of different systems, with each system needing to be deployed at a specific version level.

The first scenario related to the software versions. The UML diagrams presented to describe the architecture were at too high a level to show the actual dependencies, but to add the level of detail needed would have made the diagrams too busy.

Although not yet complete the work already done to provide a Linked Data perspective of the SOA services enabled a more fined grained view of the actual dependencies.  Knowing what the specific lower level dependencies were resulted in more flexibility with the actual deployment. In particular work could start on developing the new functionality for one component since it was not going to be affected by proposed changes in another component. On the original UML diagram both components were shown as requiring changes. The Linked Data perspective provided enough additional detail to see that the changes could happen in parallel.

The second scenario related to finding the owners of external data sources so that we could determine if they were available for use in a given test environment. Adding this ownership information to our  Linked Data repository would speed up this part of the process in the future.

Linked Data and the SOA Software Development Process

Thursday, November 17th, 2011

We have quite a rigorous SOA software development process however the full value of the collected information is not being realized because the artifacts are stored in disconnected information silos. So far attempts to introduce tools which could improve the situation (e.g. zAgile Teamwork and Semantic Media Wiki) have been unsuccessful, possibly because the value of a Linked Data approach is not yet fully appreciated.

To provide an example Linked Data view of the SOA services and their associated artifacts I created a prototype consisting of  Sesame running on a Tomcat server with Pubby providing the Linked Data view via the Sesame SPARQL end point. TopBraid was connected directly to the Sesame native store (configured via the Sesame Workbench) to create a subset of services sufficient to demonstrate the value of publishing information as Linked Data. In particular the prototype showed how easy it became to navigate from the requirements for a SOA service through to details of its implementation.

The  prototype also highlighted that auto generation of the RDF graph (the data providing the Linked Data view) from the actual source artifacts would be preferable to manual entry, especially if this could be transparently integrated with the current software development process. This is has become the focus of the next step, automated knowledge extraction from the source artifacts.


Key artifact types of our process include:

A Graph of Concepts and Instances

There is a rich graph of relationships linking the things described in the artifacts listed above. For example the business entities defined in the UML analysis model are the subject of the service and service operations defined in the Service Contracts. The service and service operations are mapped to the WSDLs which utilize the Xml Schema’s that provide an XML view of business entities. The JAX-WS implementations are linked to the WSDLs and Xml Schema’s and deployed to the Oracle Weblogic Application Server where the configuration files list the external dependencies. The log files and defects link back to specific parts of the code base (Subversion revisions) within the context of specific service operations. The people associated with the different artifacts can often be determined from artifact meta-data.

RDF, OWL and Linked Data are a natural fit for modelling and viewing this graph since there is a mix of concepts plus a lot of instances, many of whom already have a HTTP representation. Also the graph contains a number of transitive relationships , (for example a WSDL may import an Xml Schema which in turn imports another Xml Schema etc …) promoting the use of the owl:TransitiveProperty to help obtain a full picture of all the dependencies a component may have.

Knowledge Extraction

Another advantage of the RDF, OWL, Linked Data approach is the utilization of unique URIs for identifying concepts and instances. This allows information contain in one artifact, e.g. a WSDL, to be extracted as RDF triples which would later be combined with the RDF triples extracted from the JAX-WS annotation of Java source code. The combined RDF triples tell us more about the WSDL and its Java implementation than could be derived from just one of the artifacts.

We have made some progress with knowledge extraction but this is still definitely a work in progress. Sites such as ConverterToRdf, RDFizers and the Virtuoso Sponger provide tools and information on generating RDF from different artifact types. Part of the current experimentation is around finding tools that can be transparently layered over the top of the current software development process. Finding the best way to extract the full set of desired RDF triples from Microsoft Word documents is also proving problematic since some natural language processing is required.

Tools currently being evaluated include:

The Benefits of Linked Data

The prototype showed the benefits of Linked Data for navigating from the requirements for a SOA service through to details of its implementation. Looking at all the information that could be extracted leads on to a broader view of the benefits Linked Data would bring to the SOA software development process.

One specific use being planned is the creation of a Service Registry application providing the following functionality:

  • Linking the services to the implementations running in a given environment, e.g. dev, test and production. This includes linking the specific versions of the requirement, design or implementation artifacts and detailing the runtime dependencies of each service implementation.
  • Listing the consumers of each service and providing summary statistics on the performance, e.g. daily usage figures derived from audit logs.
  • Providing a list of who to contact when a service is not available. This includes notifying consumers of a service outage and also contacting providers if a service is being affected by an external component being offline, e.g. a database or an external web service.
  • Search of the services by different criteria, e.g. business entity
  • Tracking the evolution of services and being able to assist with refactoring, e.g answering questions such as “Are there older versions of the Xml Schemas that can be deprecated?”
  • Simplify the running of a specific Soapui test case for a service operation in a given environment.
  • Provide the equivalent of a class lookup that includes all project classes plus all required infrastructure classes and returns information such as the jar file the class is contained in and JIRA and Subversion information.

Using the Neon Toolkit and ANNIE to demonstrate extracting RDF from Natural Language

Sunday, July 10th, 2011

The Neon Toolkit is an open source ontology engineering environment providing an extensive set of plug-ins for various ontology engineering activities.

One such plugin is the GATE web services plugin which adds Natural Language Entity Recognition functionality from the GATE (General Architecture for Text Engineering) framework.

The GATE web services plugin can be quickly added to the Neon Toolkit by

  • opening the Help | Install New Software … menu option
  • selecting “NeOn Toolkit Update Site v2.4 –″ from the Work With drop down combo box.
  • and selecting GATE Web Services as shown below.

The GATE web services plugin includes ANNIE (Ontology Generation Based on Named Entity Recognition) which can be used to demonstrate basic Named Entity Recognition and onotology generation. The main GATE site provides more details on how ANNIE: a Nearly-New Information Extraction System works.

After the GATE web services plugin has been installed GATE Services appears as an additional top level menu option. Selecting GATE Services | Call Multi-document Web Service opens the Call GATE web service dialog box below which provides the option to select ANNIE as the service to call.

Selecting ANNIE and Next invokes an additional dialog box where the Input directory: containing the documents to be processed and the Output ontology: can be specified.

Once the Input directory: and the Output ontology: have been specified and the Finish button selected ANNIE reads the input and generates a basic ontology according to the concepts, instances and relations found in the text.

When the text below is provided as input ANNIE generates the following RDF output.

Input Text

Nick lives in Toronto and studies at Concordia University. Toronto is six hours from Montreal. Toronto is a nice place to live.

RDF Output

<?xml version="1.0" encoding="UTF-8"?>
<!-- All statement -->

<rdf:Description rdf:about="">
	<rdf:type rdf:resource=""/>
	<rdfs:label xml:lang="en">nick</rdfs:label>
	<rdfs:label xml:lang="en">Nick</rdfs:label>

<rdf:Description rdf:about="">
	<rdf:type rdf:resource=""/>
	<rdfs:label xml:lang="en">montreal</rdfs:label>
	<rdfs:label xml:lang="en">Montreal</rdfs:label>

<rdf:Description rdf:about="">
	<rdf:type rdf:resource=""/>
	<rdfs:label xml:lang="en">Location</rdfs:label>

<rdf:Description rdf:about="">
	<rdf:type rdf:resource=""/>
	<rdfs:label xml:lang="en">concordia university</rdfs:label>
	<rdfs:label xml:lang="en">Concordia University</rdfs:label>
	<rdfs:label xml:lang="en">Concordia_University</rdfs:label>

<rdf:Description rdf:about="">
	<rdf:type rdf:resource=""/>
	<rdfs:label xml:lang="en">Organization</rdfs:label>

<rdf:Description rdf:about="">
	<rdf:type rdf:resource=""/>
	<rdfs:label xml:lang="en">Person</rdfs:label>

<rdf:Description rdf:about="">
	<rdf:type rdf:resource=""/>
	<rdfs:label xml:lang="en">toronto</rdfs:label>
	<rdfs:label xml:lang="en">Toronto</rdfs:label>


The entities recognized are:

  • Nick as a Person
  • Montreal and Toronto as Locations
  • Concordia University as an Organization.

While relatively simplistic the overall example comprising the input text,  the generated RDF output and the quick setup process for the Neon Toolkit and the GATE web services plugin helped to demonstrate the potential of Named Entity Recognition and ontology generation.

The input text actually comes from a demo of the OwlExporter which provides similar functionality for GATE itself. Longer term GATE is likely to be part of a Natural Language Processing solution for a government department where the sensitivity of the private data would preclude the use of an external web service. Hopefully there will also be time later on to write up the results of using GATE and the OwlExporter with the same input text.

(For this article  Neon Toolkit version 2.4.2 was used.)

Drupal7 RDFa XMLLiteral content processing

Saturday, March 12th, 2011

Drupal 7 supports RDFa 1.0 as part of the core product. RDFa 1.0 is the current specification but RDFa 1.1 is to be released shortly.

RDFa 1.0 metadata can be parsed using the RDFa Distiller and Parser while the RDFa Distiller and Parser (Test Version for RDFa 1.1) can be used to extract RDFa 1.1.

Creating a simple Drupal 7 test blog and parsing out the RDFa 1.0 metadata with the RDFa Distiller and Parser shows that Drupal 7 is using the SIOC (Semantically-Interlinked Online Communities) ontology to describe blog posts and identifies the Drupal user as the creator of the post using the sioc:has_creator property.

<sioc:Post rdf:about="">
  <rdf:type rdf:resource=""/>
    <sioc:UserAccount rdf:about="">
  <content:encoded rdf:parseType="Literal"><p xml:lang="en" xmlns="">Test Blog</p>

The sioc:UserAccount is a sub class of foaf:OnlineAccount.

What I would like to do is add additional RDFa metadata within the content of the blog to associate me, the Drupal 7 user with a sioc:UserAccount, to me the foaf:Person identified by my FOAF file.

Drupal 7 content is wrapped by XHTML elements containing the property=”content:encoded” (shown below) and an RDFa parser treats this content as an XMLLiteral.

<div property="content:encoded">

The problem is that RDFa 1.0 parsers don’t extract metadata contained within the XMLLiteral.

This was raised in the issue “XMLLiteral content isn’t processed for RDFa attributes in RDFa 1.0 – should this change in RDFa 1.1? a while back with the result that in RDFa 1.1 parsers should now also process the XMLLiteral content.

To make sure that the RDFa parsers know that I want to use RDFa 1.1 processing I need to update Drupal 7 to use the  XHTML+RDFa Driver Module defined in the XHTML+RDFa 1.1 spec.

This turns out to be a simple update of one Drupal 7 file, site/modules/system/html.tpl.php.

Near the top of the file the version is changed to 1.1 (in two places) and the dtd changed to  “”.

?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.1//EN"
<html xmlns="" xml:lang="<?php print $language->language; ?>" version="XHTML+RDFa 1.1" dir="<?php print $language->dir; ?>"<?php print $rdf_namespaces; ?>>

With these changes made I can create another blog containing the following RDFa metadata

<div about="" typeof="foaf:Person">
<div rel="foaf:account" resource="">

knowing that that an RDFa 1.1 parser will create the RDF triples below which link the Drupal 7 user to me the person identified in my FOAF file.

  <foaf:Person rdf:about="">
      <sioc:UserAccount rdf:about="">

The differences between the RDF extracted with an RDFa 1.0 parser and an RDFa 1.1 parser can be seen using the two links below.

Now that I know that the RDFa 1.1 metadata embedded in the content will be processed accordingly I can move on to the task of building 137 Breaker Bay, a simple accommodation site where the plan is to use RDFa and ontologies such as GoodRelations to describe the both the accommodation services available and the attractions and services of the surrounding area.

A Simple HTML5 RDFa Example

Wednesday, November 10th, 2010

As part of learning HTML5 and RDFa I put together a Simple HTML5 RDFa Example, using a photo Irene took of Minoan Figurines during a trip to Crete for the main content.

A Simple HTML5 RDFa Example

Identifying Things

Using RDFa I wanted to generate RDF statements about:

Each of these five things requires an URI. The example automatically has one ( ) while Irene and myself are identified by our FOAF files, Irene and Richard.

The remaining two URIs are created by adding the HTML bookmarks “crete” and “minoan-figurines” to the example, generating the URIs:

HTML5 Doctype and RDFa Version

To support both HTML5 and RDFa I added the following html5 doctype and rdfa version declaration.

<!DOCTYPE html>
<html version="HTML+RDFa 1.1" lang="en"

The HTML5 new elements header, hgroup, nav, section, article and footer are used in the example, primarily for constructing a document structure that will be developed further in the future.

Viewing the RDF

The  link RDF extracted by pyRDFa uses the RDFa Distiller and Parser to extract the RDF statements. If Tabulator is installed clicking the link provides the following view of the generated RDF.

A Simple HTML5 RDFa Extraction

About the Example

The statements about A Simple HTML5 RDFa Example are all made in the document meta data, using the properties dc:date, dc:created, dc:creator, dc:title and dc:subject from the Dublin Core Metadata Element Set. The metadata in the head of the document refers to the example itself.

<head profile="">
<meta property="dc:date dc:created" content="2010-11-11T13:00:00" />
<meta rel="dc:creator" 	href="" />
<meta rel="dc:subject" href="" />
<meta rel="dc:subject" href="" />
<meta rel="dc:subject" href="" />
<meta rel="dc:subject" href="" />
<meta rel="dc:subject" 	href="" />
<meta rel="dc:subject" 	href="" />
<title property="dc:title">A Simple HTML5 RDFa Example</title>

About Crete and the Minoan Figurines

In the code below the RDFa about attribute, specified as about=”#crete” and about=”#minoan-figurines”, sets the current subject for the article on Crete and the photo of the Minoan Figurines respectively.  The appropriate creator and subject is also assigned to each subject.

<a name="crete" />
<div about="#crete" rel="dc:creator" href="">
<h2 about="#crete" rel="dc:subject" href="" property="dc:title">Crete 2010</h2>
<a name="minoan-figurines" />
<div class="imgbox" about="#minoan-figurines"><img 	src="images/minoan-figurines.jpg" alt="figurines" />
<div><span property="dc:title" rel="dc:subject" href="">Minoan Figurines, Crete</span>
photo by <span rel="dc:creator" href="">Irene</span>.</div>

About Richard and Irene

For Richard and Irene the typeof attribute is set to foaf:Person,  the about attribute specifies the appropriate FOAF file and foaf:knows is used to specify that Richard knows Irene.

<div class="socialnet" about="" typeof="foaf:Person" property="foaf:name" content="Richard Hancock">
<p><span property="foaf:firstname">Richard</span> knows</p>
<ul rel="foaf:knows">
<li typeof="foaf:Person" about="">
	<a property="foaf:name" rel="foaf:homepage" href="">Irene</a></li>

Combining Information

One of the benefits of using RDF is that it easy to combine information. A small example of how easily RDF statements from different sources can be combined is provided using Tabulator. If the link RDF extracted by pyRDFa is opened in Tabulator followed by the link to Irenes FOAF file then the is creator of statement is included in the second Tabulator view, even though it is not present in the original FOAF file.

A Simple HTML5 RDFa Example Irene

Because Irene is uniquely identified Tabulator can safely combine the information from the two datasources.

SPARQL Query for Content By Author

RDF extracted from the example can be queried using SPARQL. The following query identifying the content authors can be pasted into the query form. The FROM key word specifies that the query will use the RDF extracted from the example, yielding the following results.

PREFIX dc: <>
PREFIX foaf: <>
select ?Content ?Author
where { ?s dc:creator ?o .
?s dc:title ?Content .
?o foaf:name ?Author .
order by ?Content

SPARQL Query Results for Content By Author

| Content                          | Author               |
| "A Simple HTML5 RDFa Example"@en | "Richard Hancock"@en |
| "Crete 2010"@en                  | "Richard Hancock"@en |
| "Minonan Figurines, Crete"@en    | "Irene"@en           |

Embedded SPARQL Query

The example contains the SPARQL Query for Content by Author embedded as the link

This URL encodes the SPARQL Query that is sent to the SPARQL end point.

When the link is selected the SPARQL Query is run against the RDF extracted from the example and returned directly to the browser.

What Next

Both HTML5 and RDFa are addictive. For HTML5 there are lots of new features to explore and for RDFa more meta data to connect up.