Using the Neon Toolkit and ANNIE to demonstrate extracting RDF from Natural Language

The Neon Toolkit is an open source ontology engineering environment providing an extensive set of plug-ins for various ontology engineering activities.

One such plugin is the GATE web services plugin which adds Natural Language Entity Recognition functionality from the GATE (General Architecture for Text Engineering) framework.

The GATE web services plugin can be quickly added to the Neon Toolkit by

  • opening the Help | Install New Software … menu option
  • selecting “NeOn Toolkit Update Site v2.4 – http://neon-toolkit.org/plugins/2.4″ from the Work With drop down combo box.
  • and selecting GATE Web Services as shown below.

The GATE web services plugin includes ANNIE (Ontology Generation Based on Named Entity Recognition) which can be used to demonstrate basic Named Entity Recognition and onotology generation. The main GATE site provides more details on how ANNIE: a Nearly-New Information Extraction System works.

After the GATE web services plugin has been installed GATE Services appears as an additional top level menu option. Selecting GATE Services | Call Multi-document Web Service opens the Call GATE web service dialog box below which provides the option to select ANNIE as the service to call.

Selecting ANNIE and Next invokes an additional dialog box where the Input directory: containing the documents to be processed and the Output ontology: can be specified.

Once the Input directory: and the Output ontology: have been specified and the Finish button selected ANNIE reads the input and generates a basic ontology according to the concepts, instances and relations found in the text.

When the text below is provided as input ANNIE generates the following RDF output.

Input Text

Nick lives in Toronto and studies at Concordia University. Toronto is six hours from Montreal. Toronto is a nice place to live.

RDF Output

<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF
	xmlns:protons="http://proton.semanticweb.org/2005/04/protons#"
	xmlns:protonu="http://proton.semanticweb.org/2005/04/protonu#"
	xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
	xmlns:owl="http://www.w3.org/2002/07/owl#"
	xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
	xmlns:protonkm="http://proton.semanticweb.org/2005/04/protonkm#"
	xmlns:protont="http://proton.semanticweb.org/2005/04/protont#">
<!-- All statement -->

<rdf:Description rdf:about="http://gate.ac.uk/owlim#Nick">
	<rdf:type rdf:resource="http://gate.ac.uk/owlim#Person"/>
	<rdfs:label xml:lang="en">nick</rdfs:label>
	<rdfs:label xml:lang="en">Nick</rdfs:label>
	<rdfs:label>Nick</rdfs:label>
</rdf:Description>

<rdf:Description rdf:about="http://gate.ac.uk/owlim#Montreal">
	<rdf:type rdf:resource="http://gate.ac.uk/owlim#Location"/>
	<rdfs:label xml:lang="en">montreal</rdfs:label>
	<rdfs:label xml:lang="en">Montreal</rdfs:label>
	<rdfs:label>Montreal</rdfs:label>
</rdf:Description>

<rdf:Description rdf:about="http://gate.ac.uk/owlim#Location">
	<rdf:type rdf:resource="http://www.w3.org/2002/07/owl#Class"/>
	<rdfs:label xml:lang="en">Location</rdfs:label>
	<rdfs:label>Location</rdfs:label>
</rdf:Description>

<rdf:Description rdf:about="http://gate.ac.uk/owlim#Concordia_University">
	<rdf:type rdf:resource="http://gate.ac.uk/owlim#Organization"/>
	<rdfs:label xml:lang="en">concordia university</rdfs:label>
	<rdfs:label xml:lang="en">Concordia University</rdfs:label>
	<rdfs:label xml:lang="en">Concordia_University</rdfs:label>
	<rdfs:label>Concordia_University</rdfs:label>
</rdf:Description>

<rdf:Description rdf:about="http://gate.ac.uk/owlim#Organization">
	<rdf:type rdf:resource="http://www.w3.org/2002/07/owl#Class"/>
	<rdfs:label xml:lang="en">Organization</rdfs:label>
	<rdfs:label>Organization</rdfs:label>
</rdf:Description>

<rdf:Description rdf:about="http://gate.ac.uk/owlim#Person">
	<rdf:type rdf:resource="http://www.w3.org/2002/07/owl#Class"/>
	<rdfs:label xml:lang="en">Person</rdfs:label>
	<rdfs:label>Person</rdfs:label>
</rdf:Description>

<rdf:Description rdf:about="http://gate.ac.uk/owlim#Toronto">
	<rdf:type rdf:resource="http://gate.ac.uk/owlim#Location"/>
	<rdfs:label xml:lang="en">toronto</rdfs:label>
	<rdfs:label xml:lang="en">Toronto</rdfs:label>
	<rdfs:label>Toronto</rdfs:label>
</rdf:Description>

</rdf:RDF>

The entities recognized are:

  • Nick as a Person
  • Montreal and Toronto as Locations
  • Concordia University as an Organization.

While relatively simplistic the overall example comprising the input text,  the generated RDF output and the quick setup process for the Neon Toolkit and the GATE web services plugin helped to demonstrate the potential of Named Entity Recognition and ontology generation.

The input text actually comes from a demo of the OwlExporter which provides similar functionality for GATE itself. Longer term GATE is likely to be part of a Natural Language Processing solution for a government department where the sensitivity of the private data would preclude the use of an external web service. Hopefully there will also be time later on to write up the results of using GATE and the OwlExporter with the same input text.

(For this article  Neon Toolkit version 2.4.2 was used.)

One Response to “Using the Neon Toolkit and ANNIE to demonstrate extracting RDF from Natural Language”

  1. Ninus says:

    All the tools that we develop at the Semantic Software Lab is open source. If you have any questions or suggestions regarding any of our tools, we would love to hear from you at “http://www.semanticsoftware.info/forums/tools-resources-forum/durm-corpus-wiki-tools”.

    P.S. The example also exports coreferences (entities that reappear in different parts of the text), in this case “Toronto”.