Archive for the ‘Ontology’ Category

Using the Neon Toolkit and ANNIE to demonstrate extracting RDF from Natural Language

Sunday, July 10th, 2011

The Neon Toolkit is an open source ontology engineering environment providing an extensive set of plug-ins for various ontology engineering activities.

One such plugin is the GATE web services plugin which adds Natural Language Entity Recognition functionality from the GATE (General Architecture for Text Engineering) framework.

The GATE web services plugin can be quickly added to the Neon Toolkit by

  • opening the Help | Install New Software … menu option
  • selecting “NeOn Toolkit Update Site v2.4 –″ from the Work With drop down combo box.
  • and selecting GATE Web Services as shown below.

The GATE web services plugin includes ANNIE (Ontology Generation Based on Named Entity Recognition) which can be used to demonstrate basic Named Entity Recognition and onotology generation. The main GATE site provides more details on how ANNIE: a Nearly-New Information Extraction System works.

After the GATE web services plugin has been installed GATE Services appears as an additional top level menu option. Selecting GATE Services | Call Multi-document Web Service opens the Call GATE web service dialog box below which provides the option to select ANNIE as the service to call.

Selecting ANNIE and Next invokes an additional dialog box where the Input directory: containing the documents to be processed and the Output ontology: can be specified.

Once the Input directory: and the Output ontology: have been specified and the Finish button selected ANNIE reads the input and generates a basic ontology according to the concepts, instances and relations found in the text.

When the text below is provided as input ANNIE generates the following RDF output.

Input Text

Nick lives in Toronto and studies at Concordia University. Toronto is six hours from Montreal. Toronto is a nice place to live.

RDF Output

<?xml version="1.0" encoding="UTF-8"?>
<!-- All statement -->

<rdf:Description rdf:about="">
	<rdf:type rdf:resource=""/>
	<rdfs:label xml:lang="en">nick</rdfs:label>
	<rdfs:label xml:lang="en">Nick</rdfs:label>

<rdf:Description rdf:about="">
	<rdf:type rdf:resource=""/>
	<rdfs:label xml:lang="en">montreal</rdfs:label>
	<rdfs:label xml:lang="en">Montreal</rdfs:label>

<rdf:Description rdf:about="">
	<rdf:type rdf:resource=""/>
	<rdfs:label xml:lang="en">Location</rdfs:label>

<rdf:Description rdf:about="">
	<rdf:type rdf:resource=""/>
	<rdfs:label xml:lang="en">concordia university</rdfs:label>
	<rdfs:label xml:lang="en">Concordia University</rdfs:label>
	<rdfs:label xml:lang="en">Concordia_University</rdfs:label>

<rdf:Description rdf:about="">
	<rdf:type rdf:resource=""/>
	<rdfs:label xml:lang="en">Organization</rdfs:label>

<rdf:Description rdf:about="">
	<rdf:type rdf:resource=""/>
	<rdfs:label xml:lang="en">Person</rdfs:label>

<rdf:Description rdf:about="">
	<rdf:type rdf:resource=""/>
	<rdfs:label xml:lang="en">toronto</rdfs:label>
	<rdfs:label xml:lang="en">Toronto</rdfs:label>


The entities recognized are:

  • Nick as a Person
  • Montreal and Toronto as Locations
  • Concordia University as an Organization.

While relatively simplistic the overall example comprising the input text,  the generated RDF output and the quick setup process for the Neon Toolkit and the GATE web services plugin helped to demonstrate the potential of Named Entity Recognition and ontology generation.

The input text actually comes from a demo of the OwlExporter which provides similar functionality for GATE itself. Longer term GATE is likely to be part of a Natural Language Processing solution for a government department where the sensitivity of the private data would preclude the use of an external web service. Hopefully there will also be time later on to write up the results of using GATE and the OwlExporter with the same input text.

(For this article  Neon Toolkit version 2.4.2 was used.)

Developing a Semantic Web Strategy

Tuesday, August 10th, 2010

In the last chapter of his book “Pull: The Power of the Semantic Web to Transform Your Business” David Siegel outlines some steps for developing a successful Semantic Web strategy for your business or organization.

One approach that worked for me recently was to organize a meeting titled “Developing a Semantic Web Strategy”  and invite along developers, architects, analysts and managers. This was in the context of a government organization and the managers were from the applications development area.

Sharing out books like Semantic Web for the Working Ontologist, Semantic Web For Dummies, Programming the Semantic Web and Semantic Web Programming prior to the meeting helped people get familiar with concepts like URIs as names for things, RDF, RDFS, OWL, SPARQL and RDFa.

To highlight how rapidly the Web of Data is evolving and the amount of information now being published as Linked Open Data, I stepped through Mark Greaves excellent presentation The Maturing Semantic Web: Lessons in Web-Scale Knowledge Representation.

During the meeting I took a business strategy first, technology second approach, taking the time to explore how an approach that has worked for someone else might fit with our organization.

Areas explored included:

Enterprise Modeling

I spent some time comparing RDF / OWL modeling with the UML modeling, highlighting how URIs enable modeling across distributed information sources without the need to consolidate everything in a central repository like you do with UML tools.

Also touched on OWL features such as:

Because it is a government department I highlighted the Federal Enterprise Architecture Reference Model Ontology (FEA-RMO) and how such an ontology could be used to map a parliamentary initiative to the software providing its implementation.

Open Government

Given the current trend for governments to make datasets freely available I presented the Linked Data approaches taken by and as examples to follow in this area.

The business case for Linked Data in this scenario is that Linked Data is seen as the best available approach for publishing data in hugely diverse and distributed environments, in a gradual and sustainable way (see Why Linked Data for for details).

RDFa Based Integration

One example that struck a chord was RDFa and Linked Data in UK Government Websites where job vacancy details  from different sites can easily be combined since each web site publishes their web pages using HTML with RDFa added to annotate the job vacancy. Using RDFa allows the same page to be read as either HTML or RDF. The end result is that integration can be achieved with minimal changes to the original sites.

Search Engine Optimisation (SEO)

For anyone advertising products and services online the business strategy to follow is the example set by which describes its stores and products using the Good Relations ontology and embeds these descriptions into its web pages using RDFa, increasing search engine traffic by 30%.

Enterprise Web of Data

Within our software development process, from project inception to production release and subsequent maintenance release, information is being copied and duplicated in a number of different places. Silos abound, in the form of word documents, spread sheets and the sticky notes that are part of the “Agile” process. There is some good information on our wiki pages but it is unstructured and not machine readable.

The information that forms our internal processes fails David Siegel’s Semantic Web Acid Test:

  • It’s not semantic and
  • It’s not on the web.

Introducing a Semantic Wiki such as Semantic MediaWiki, to hold project information and link this information to other datasources was raised as a candidate for a semantic web proof of concept.


Just scheduling the meeting was in itself a successful outcome since it started discussion around the role Semantic Web technologies could play in our organization. For a number of people, including the Applications Development manager, this is new technology and they need time to absorb it but the end result was agreement that it was technology that couldn’t be ignored.

In order to gain some practical experience two internal prototypes were agreed to,  both with practical value for the organization.

The first is a small application that will show the full set of runtime dependencies for a given software component as well as the other components affected when the specified component is changed. The application will be based on a simple ontology that defines dependencies between components using the owl:TransitiveProperty and uses a reasoner (e.g. Pellet) to infer the full set of dependencies for a component.

The second prototype will trial Semantic MediaWiki for project management (potentially using the Teamwork Ontology). The longer term view is customize Semantic MediaWiki to include artifacts created as part of the software development process, addressing some of the silo problems found in our current internal enterprise web of data.

Once practical knowledge has been gained from the internal prototypes a meeting will be scheduled with the Enterprise Architecture team to canvas the establishment of a wider vision for the use of Linked Data and Semantic Web technologies, potentially leading to its use on the public web sites, actively publishing to the Web of Data.

A GoodRelations Semantic Web Description of a Business

Saturday, April 11th, 2009

Tried out the newly released GoodRelations Annotator to create a Semantic Web description of a business.

The GoodRelations Annotator is an online form-based tool that creates an RDF/XML file “semanticweb.rdf” containing a description of the key aspects of the business. The description is based on concepts defined in the GoodRelations OWL ontology. In particular the description contains a BusinessEntity representing the business and one or more Offerings. Each Offering describes the intent to provide a Business Function for a certain Product or Service to a specified target audience.

The generated RDF/XML file can be either be published directly on the company’s Web site or used as a skeleton for developing a more fine-grained description.

The link Publishing GoodRelations Data on the Web provides guidelines on publishing to the web.

In my case I created a description for my embryonic business 3kbo.

I’m interested in linking the generated semanticweb.rdf to other things, in particular linking the BusinessEntity with people and with other BusinessEntitys.

Initially I added the URI of my foaf file to the BusinessEntity instance using rdfs:seeAlso, but after reading the definition of BusinessEntity i.e. that it represents the legal agent making a particular offering and
can be a legal body or a person, I changed it to owl:sameAs.


<gr:BusinessEntity rdf:ID=”BusinessEntity”>



This makes sense for my simple case, since as a sole trader I am the BusinessEntity. When viewed in Firefox using the Tabulator Extension owl:sameAs also provides an inferred link from my foaf file to my semanticweb.rdf as shown below.


A part of the business description I don’t understand yet is how best to use the eClassOWL ontology to describe the Product or Service.

For example using the GoodRelations Annotator I selected “19 information, communication and media technology” as the Category and “1904 Software” as the Group.


This leads to being used in the definition of the product or service, i.e.

<gr:ProductOrServicesSomeInstancesPlaceholder rdf:ID=“ProductOrServicesSomeInstancesPlaceholder_1″>
<rdf:type rdf:resource=”"&eco;#C_AKJ317003-tax”>


Because of the size of the eClassOWL ontology it takes awhile to dereference this link. It would be good to be able to provide a  more user friendly reference at this point that provided a description of the product or service.

Beyond this simple example I am interested in semantic web descriptions of other more complex relationships between a BusinessEntity (when not a person) and the people involved with the business (e.g. directors, CEO etc …) and between other BusinessEntitys.

Potentially GoodRelations and eClassOWL could be used as part of an Enterprise Architecture describing the who, what, how, when, where and why of a business.

Publishing Inspection and Test Plans as Linked Data

Saturday, February 14th, 2009

Stored in relational database tables within CDMS is a subset of information that could usefully be shared with the general building, civil engineering and construction industry to help promote a higher, uniform standard of quality.

This subset is described in more detail in the article Constructing an Ontology – Common Inspection and Test Plans.

The central concept is an Inspection and Test Plan (ITP) that identifies the points in a construction project when work of a specific type will be inspected and verified that it meets acceptance criteria.

An elegant way to share the Inspection and Test Plans is to publish them on the Semantic Web as an OWL ontology. The preferred way to doing this is to publish the Inspection and Test Plans as Linked Data, following the patterns outlined in the tutorial on How to publish Linked Data on the Web.

Once published as Linked Data the Inspection Test and Plans can be used by other building applications.

The process for doing this is to create  an owl:ObjectProperty such as the following appliesInspectionTestPlan and add it to the OWL definition of a building project.

a       owl:ObjectProperty ;
rdfs:domain building:BuildingProject ;
rdfs:label “applies inspection test plan”^^xsd:string .

The  example building project contains appliesInspectionTestPlan and applies it to the Breaker Bay building project as shown in the image below.

Inspection Test Plan Applied

In the example the Inspection and Test Plan referenced is contained in a data extract published at

The D2R Server provides an easy way to publish the actual relational database tables containing the Common Inspection and Test Plans to the Semantic Web using the patterns outlined in the tutorial on How to publish Linked Data on the Web.

The Common Inspection and Test Plans are held in a MySQL database which was created by running the command:

mysql> create database common_itps character set utf8 ;

The D2R Server is installed by following the Quick Start instructions and adding the MySQL driver JAR file to the D2R Server installation /lib directory.

The mapping file was created by running the command

./generate-mapping -o mapping_common_itps.n3 -u db-user -p db-password jdbc:mysql://localhost/common_itps

and customized by removing some unnecessary database columns from the mapping file.

Also the following was added to the mapping file to explicitly define the server host and port.

@prefix d2r: <> .

<> a d2r:Server;
rdfs:label “D2R Server”;
d2r:baseURI <>;
d2r:port 2020;
d2r:documentMetadata [
rdfs:comment "The Common Inspection and Test Plans are currently published as an experimental version. The data needs to be rationalized and the model confirmed before progressing to a draft status.";

Once the mapping file  has been generated and customized the server is started with the command:

nohup ./d2r-server mapping_common_itps.n3 &

An experimental verison of the “Common Inspection and Test Plans” can now be browsed online at in a web browser, ideally using Firefox with the Tabulator extension install.

Individual Inspection Test Plans can be opened directly in the browser using URLs such as

Via content negotiation this request is redirected to the browser friendly page

The acutal RDF can be viewed directly at

SPARQL queries can be run using the AJAX-based SPARQL explorer

Common Ontologies for Semantic Web Domain Modeling

Saturday, October 11th, 2008

Below is a list of the ontologies I am currently using for semantic web domain modeling.

Core Ontologies

  • RDF Vocabulary The RDF Schema for the RDF vocabulary defined in the RDF namespace xmlns:rdf=””
  • RDFS The RDF Schema vocabulary xmlns:rdfs=””
  • XML Schema xmlns:xsd=””
  • OWL Web Ontology Language xmlns:owl=””
  • Dublin Core xmlns:dc=””
  • Dublin Core Metadata Terms xmlns:dct=”

Common Ontologies

Additional Ontologies