Archive for the ‘REST’ Category

Using Groovy to Upload RDF files to the Talis Platform

Saturday, March 13th, 2010

The Talis Platform provides free stores for developers to host RDF data online. Each store has its own SPARQL end point for querying the RDF data.

Options for uploading individual RDF files into a store include:

A nice to have option would be to be able to upload all the RDF files found in a directory directly into a store using a simple command like TalisStore.load.

Groovy with its flexible scripting is a good candidate for this type of work. Code like the following makes it easy to traverse directories and list the RDF files

  • in the current directory:
    <br />
    new File(&quot;.&quot;).eachFileMatch(~/.*\.rdf/) { println it }<br />
    
  • or in a specific directory:
    <br />
    new File(&quot;/data/rdf&quot;).eachFileMatch(~/.*\.rdf/) { println it }<br />
    

Once Groovy is installed the above lines of code can be run directly in both the Groovy Shell (groovysh) and the Groovy Console (groovyConsole). For example when run in the Groovy Shell (groovysh) :

<br />
$ groovysh<br />
Groovy Shell (1.6.4, JVM: 1.6.0_15)<br />
Type 'help' or '\h' for help.<br />
-------------------------------------------------------------------------------------<br />
groovy:000&gt; new File(&quot;.&quot;).eachFileMatch(~/.*\.rdf/) { println it }<br />
./WO0002.rdf<br />
./WO0003.rdf<br />
./WO0004.rdf<br />
./WO0005.rdf<br />

The Groovy RESTClient simplifies REST operations like POSTing (uploading) files to a web site. It is an extension of HTTPBuilder which in turn is a wrapper of Apache’s HttpClient. The main addition required for the RESTClient to upload RDF/XML files to a Talis store is an “application/rdf+xml” encoder. This is easy to create following the example provided in the article Groovy RESTClient and Putting Zip Files.

The result is the encodeRDF method shown below.

<br />
import groovyx.net.http.RESTClient<br />
import org.apache.http.entity.FileEntity<br />
TalisStoreLoader() {<br />
 talis = new RESTClient( &quot;http://api.talis.com/&quot; )<br />
 talis.auth.basic TALIS_USERNAME, TALIS_PASSWORD<br />
 talis.encoder.'application/rdf+xml' = this.&amp;encodeRDF<br />
 }<br />
def encodeRDF( Object data ) throws UnsupportedEncodingException {<br />
 if ( data instanceof File ) {<br />
 def entity = new FileEntity( (File) data, &quot;application/rdf+xml&quot; );<br />
 entity.setContentType( &quot;application/rdf+xml&quot; );<br />
 return entity<br />
 } else {<br />
 throw new IllegalArgumentException(<br />
 &quot;Don't know how to encode ${data.class.name} as application/rdf+xml&quot; );<br />
 }<br />
 }<br />

The line talis.encoder.’application/rdf+xml’ = this.&encodeRDF registers it with an instance of the RESTClient.

With the RDF encoder in place a file can be uploaded to a stores metabox as follows.

<br />
def res = talis.post( path: metaboxPath, body: file, requestContentType: &quot;application/rdf+xml&quot; )</p>
<p>

This functionality is encapsulated in the class com._3kbo.talis.TalisStoreLoader which is part of a maven project available for download as a zip file. It includes the script TalisStore.groovy which is a simplified wrapper of com._3kbo.talis.TalisStoreLoader.

The jar file create by the project talis-store-0.2.jar can be downloaded separately.

The RESTClient is not bundled with the standard Groovy install. Trying to access it from the shell or console without explicitly installing it will results in errors like the following:

<br />
groovy:000&gt; import groovyx.net.http.RESTClient<br />
ERROR org.codehaus.groovy.tools.shell.CommandException:<br />
Invalid import definition: 'import groovyx.net.http.RESTClient';<br />
reason: startup failed, script1266050039289.groovy:<br />
1: unable to resolve class groovyx.net.http.RESTClient<br />
 @ line 1, column 1. 1 error at java_lang_Runnable$run.call (Unknown Source)<br />

Installing the RESTClient requires downloading HTTPBuilder and adding it and its dependencies (http-builder-xxx-all.zip) to the ${user.home}/.groovy/lib directory. Also add talis-store-0.2.jar to this directory. The ${user.home}/.groovy/lib directory may need to be created manually but the Groovy install should have created a file named “$GROOVY_HOME/conf/groovy-starter.conf” containing the line

load ${user.home}/.groovy/lib/*

which enables the loading of the additional jar files required by RESTClient plus the com._3kbo.talis.TalisStoreLoader i.e:

  • http-builder-0.5.0-RC2.jar
  • httpclient-4.0.jar
  • httpcore-4.0.1.jar
  • json-lib-2.3-jdk15.jar
  • xml-resolver-1.2.jar
  • commons-collections-3.2.1.jar
  • commons-logging-1.1.1.jar
  • talis-store-0.2.jar

Using the Groovy Shell to Upload

With the RESTClient and the talis-store-0.2.jar installed the Groovy Shell (groovysh) makes it easy to run the TalisStore.groovy script and upload either individual RDF files or all the RDF files in a directory to a Talis store.

The four options for running the TalisStore.groovy script are:

  1. TalisStore.load “mystore”,”user”,”password”,”file_or_directory”
  2. TalisStore.load “mystore”,”user”,”password”
  3. TalisStore.load “file_or_directory”
  4. TalisStore.load()

The first and second options both explicitly set the store, user and password. The first option also nominates either a specific RDF file to upload or a directory to scan and upload all the RDF files found. The second option uploads all the RDF files found in the current directory, i.e. the directory in which the Groovy Shell (groovysh) was invoked.

The third and forth options read the store, user and password from the configuration file TalisConfig.groovy, updated for a specific store and available on the classpath (see below).

With the configuration file TalisConfig.groovy in place uploading a specific RDF file or a directory simplifies to TalisStore.load “file_or_directory”

Uploading the RDF files in the current directory is just TalisStore.load() as shown in the example
Loading all RDF files from the current directory below.

Using the Script to Upload

Adding the line #!/usr/bin/env groovy to the TalisStore.groovy script and making the script executable allows it to be run independent of the Groovy Shell (groovysh), for example ./TalisStore.groovy /sioc/forum/WO0902.rdf explicitly loads the RDF, using the configuration file to set the store, user and password.

See the TalisStore.groovy javadoc for more details on running as an executable script.

Summary

There is a bit of configuration to set everything up but once in place the combination of Groovy, the RESTClient and the TalisStore loader code described here makes it easy to load RDF files to the Talis Platform.

My preference is to run the Groovy Shell (groovysh) and use simple commands like TalisStore.load().

Possible extensions for the future include commands like TalisStore.sparql.select etc…

Appendix A: Examples

Loading a specific file

<br />
$ groovysh<br />
Groovy Shell (1.7.1, JVM: 1.6.0_15)<br />
Type 'help' or '\h' for help.<br />
-------------------------------------------------------------------------------<br />
groovy:000&gt; TalisStore.load &quot;mystore&quot;,&quot;user&quot;,&quot;password&quot;,&quot;/sioc/WO0401.rdf&quot;<br />
Using store: mystore user password<br />
Loading a file or directory: /sioc/WO0401.rdf<br />
Loading /sioc/WO0401.rdf<br />
Loaded 1565688 bytes in 58518 milliseconds. (Status: 204)<br />

Loading all RDF files from the current directory

<br />
$ cd /scoop/forum/<br />
$ ls -l<br />
-rw-r--r--  1  3847192  2 Jan 12:11 WO0903.rdf<br />
-rw-r--r--  1  2485605  2 Jan 12:11 WO0904.rdf<br />
-rw-r--r--  1  2321233  2 Jan 12:12 WO0905.rdf<br />
-rw-r--r--  1  2551787  2 Jan 12:12 WO0906.rdf<br />
$ groovysh<br />
Groovy Shell (1.7.1, JVM: 1.6.0_17)<br />
Type 'help' or '\h' for help.<br />
--------------------------------------------<br />
groovy:000&gt; TalisStore.load()<br />
Classpath:<br />
...<br />
Loading RDF files from directory /scoop/forum/.<br />
2010-03-14 11:32:31.477: Loading /scoop/forum/./WO0903.rdf<br />
2010-03-14 11:33:49.289: Loaded 3847192 bytes in 77808 milliseconds. (Status: 204)<br />
2010-03-14 11:33:49.304: Loading /scoop/forum/./WO0904.rdf<br />
2010-03-14 11:34:38.288: Loaded 2485605 bytes in 48984 milliseconds. (Status: 204)<br />
2010-03-14 11:34:38.289: Loading /scoop/forum/./WO0905.rdf<br />
2010-03-14 11:35:25.429: Loaded 2321233 bytes in 47140 milliseconds. (Status: 204)<br />
2010-03-14 11:35:25.43: Loading /scoop/forum/./WO0906.rdf<br />
2010-03-14 11:36:15.952: Loaded 2551787 bytes in 50523 milliseconds. (Status: 204)<br />
Loaded 4 files in 224488 milliseconds.<br />
===&gt; 4<br />
groovy:000&gt;<br />

Appendix B: Adding the Groovy Configuration File to the Classpath

The structure of the config file is:

<br />
// TalisConfig.groovy<br />
talis {<br />
    user = &quot;myusername&quot;<br />
    password = &quot;mypassword&quot;<br />
    store = &quot;mystore&quot;<br />
}<br />

Once the values have been updated for a specific store the steps for adding to the classpath and also verifying that it is being read correctly are as follows:

  • Create a directory to hold property files ( e.g. . ${user.home}/.groovy/conf/ ) and
  • Add a matching line to “$GROOVY_HOME/conf/groovy-starter.conf” to add the directory to the classpath,e.g. load ${user.home}/.groovy/conf/./
  • Place the Groovy configuration file TalisConfig.groovy in the directory (i.e. ${user.home}/.groovy/conf/)

ConfigSlurper is used to read the configuration file. The shell input below shows how to:

  • Check what is on the classpath using loader.URLs.each{ println it }
  • Get the config file using url = loader.getResource(”TalisConfig.groovy”)
  • Read the config file using def config = new ConfigSlurper().parse(url)

<br />
groovy:000&gt; import groovyx.net.http.RESTClient<br />
===&gt; [import groovyx.net.http.RESTClient]<br />
groovy:000&gt; talis = new RESTClient( &quot;http://api.talis.com/&quot; )<br />
===&gt; groovyx.net.http.RESTClient@1798928<br />
groovy:000&gt; loader = talis.class.classLoader.rootLoader<br />
===&gt; org.codehaus.groovy.tools.RootLoader@4d20a47e<br />
groovy:000&gt; loader.URLs.each{ println it }<br />
file:/Users/richardhancock/./<br />
file:/Users/richardhancock/groovy-1.6.4/lib/ant-1.7.1.jar<br />
file:/Users/richardhancock/groovy-1.6.4/lib/ant-junit-1.7.1.jar<br />
file:/Users/richardhancock/groovy-1.6.4/lib/ant-launcher-1.7.1.jar<br />
file:/Users/richardhancock/groovy-1.6.4/lib/antlr-2.7.7.jar<br />
file:/Users/richardhancock/groovy-1.6.4/lib/asm-2.2.3.jar<br />
file:/Users/richardhancock/groovy-1.6.4/lib/asm-analysis-2.2.3.jar<br />
file:/Users/richardhancock/groovy-1.6.4/lib/asm-tree-2.2.3.jar<br />
file:/Users/richardhancock/groovy-1.6.4/lib/asm-util-2.2.3.jar<br />
file:/Users/richardhancock/groovy-1.6.4/lib/bsf-2.4.0.jar<br />
file:/Users/richardhancock/groovy-1.6.4/lib/commons-cli-1.2.jar<br />
file:/Users/richardhancock/groovy-1.6.4/lib/commons-logging-1.1.jar<br />
file:/Users/richardhancock/groovy-1.6.4/lib/groovy-1.6.4.jar<br />
file:/Users/richardhancock/groovy-1.6.4/lib/ivy-2.1.0-rc2.jar<br />
file:/Users/richardhancock/groovy-1.6.4/lib/jline-0.9.94.jar<br />
file:/Users/richardhancock/groovy-1.6.4/lib/jsp-api-2.0.jar<br />
file:/Users/richardhancock/groovy-1.6.4/lib/junit-3.8.2.jar<br />
file:/Users/richardhancock/groovy-1.6.4/lib/servlet-api-2.4.jar<br />
file:/Users/richardhancock/groovy-1.6.4/lib/xstream-1.3.1.jar<br />
file:/Users/richardhancock/.groovy/lib/http-builder-0.5.0-RC2.jar<br />
file:/Users/richardhancock/.groovy/lib/httpclient-4.0.jar<br />
file:/Users/richardhancock/.groovy/lib/httpcore-4.0.1.jar<br />
file:/Users/richardhancock/.groovy/lib/json-lib-2.3-jdk15.jar<br />
file:/Users/richardhancock/.groovy/lib/xml-resolver-1.2.jar<br />
file:/Users/richardhancock/.groovy/conf/./<br />
===&gt; [Ljava.net.URL;@3ebc312f<br />
groovy:000&gt; url = loader.getResource(&quot;TalisConfig.groovy&quot;)<br />
===&gt; file:/Users/richardhancock/.groovy/conf/TalisConfig.groovy<br />
groovy:000&gt; def config = new ConfigSlurper().parse(url)<br />
===&gt; {talis={username=myusername, password=mypassword, store=mystore}}<br />
groovy:000&gt;<br />

Appendix C: Using Maven to run the Groovy Script

The TalisStore script can also be run via maven. This approach uses the jar file dependencies defined in the maven¬† project and does not require the standard Groovy install. If a valid “TalisConfig.groovy” configuration file is available on the classpath, the parameters for “store”, “username” and “password” are not required. By default the pom.xml file excludes the dummy configuration file but once it has been updated with real values it can be included by changing the exclude(s) to include(s) .¬† The TalisStore script can be run by executing command lines such as the following which invoke the TalisStore main method (optionally with parameters).

mvn exec:java -Dexec.mainClass=TalisStore

mvn exec:java -Dexec.mainClass=TalisStore -Dexec.args=”/sioc/forum/2007″

Appendix D: Authentication

The method “talis.auth.basic TALIS_USERNAME, TALIS_PASSWORD” is a bit of an anomaly since the Talis Platform uses HTTP Digest Authentication. RESTClient uses the groovyx.net.http.AuthConfigbasic” method which works for “digest” authentication as well.

DBpedia Examples using Linked Data and Sparql

Monday, August 11th, 2008

Using Wikipedia, the largest online encyclopedia, users can browse and perform full-text searches, but programmatic access to the knowledge-base is limited.

The DBpedia project extracts structured information from Wikipedia opening it up to programmatic access using Semantic Web technologies such as Linked Data and SPARQL. This means that the linking and reasoning abilities of RDF and OWL can be utilized and queries for specific information can be made using SPARQL.

Simplistically the mapping from the Wikipedia HTML based web pages to the DBpedia RDF based resources can be thought of as replacing “http://en.wikipedia.org/wiki/” with “http://dbpedia.org/resource/” but in reality there are some additional subtleties which are described in the article From Wikipedia URI-s to DBpedia URI.

The Wikipedia entry for “Civil Engineering” (http://en.wikipedia.org/wiki/Civil_Engineering) is used as an example to show how specific data can be retrieved from its DBpedia equivalent (http://dbpedia.org/resource/Civil_engineering).

When both the Wikipedia entry (http://en.wikipedia.org/wiki/Civil_Engineering) and its DBpedia equivalent (http://dbpedia.org/resource/Civil_engineering) are opened in a standard web browser they display similar information, however the DBpedia equivalent has been redirected to http://dbpedia.org/page/Civil_engineering.

This redirect can be viewed in Firefox using the Tamper Data Firefox Extension as shown in the image below.

Loading the DBpedia Resource

The initial status of 303 is the HTTP response code “303 See Other“. The server replied with the HTTP response code 303 in order to direct the browser to URI http://dbpedia.org/page/Civil_engineering which is a HTML page the browser can display. The original URI http://dbpedia.org/resource/Civil_engineering is an RDF resource that would not display as well in the HTML browser.

DBpedia implements a HTTP mechanism called content negotiation in order to provide clients such as web browsers with the information they request in a form they can display. The tutorial How to publish Linked Data on the Web describe this and other Linked Data techniques that are used by applications such as DBpedia.

In order to access the RDF resource directly a web client needs to tell the server to send it RDF data. A client can do this by sending the HTTP Request Header Accept: application/rdf+xml as part of its initial request. (The HTML browser had sent an Accept: text/html HTTP header indicating that it was requesting an HTML page.)

The Firefox Addon RESTTest can be used to set Accept: application/rdf+xml in the HTTP Request Header and directly request http://dbpedia.org/resource/Civil_engineering as shown in the image below.

In this case the request to http://dbpedia.org/resource/Civil_engineering succeeded as shown by the “Response Status 200″ and a RDF document was received as shown in the “Response Text”.

In both the RDF fragment shown in the image above and in the HTML page http://dbpedia.org/page/Civil_engineering the multiple language support is visible. The SPARQL queries below show how to extract specific information for a particular language.

SPARQL

DBpedia provides a public SPARQL endpoint at http://dbpedia.org/sparql which enables users to query the RDF datasource with SPARQL queries such as the following.

SELECT ?abstract
WHERE {
{ <http://dbpedia.org/resource/Civil_engineering> <http://dbpedia.org/ontology/abstract> ?abstract }
}

The query returns all the abstracts for Civil Engineering, in each of the available languages.

The next query refines the abstracts returned to just the language specified, in this case ‘en’ (English).

SELECT ?abstract
WHERE {
{ <http://dbpedia.org/resource/Civil_engineering> <http://dbpedia.org/ontology/abstract> ?abstract .
FILTER langMatches( lang(?abstract), ‘en’) }
}

The SNORQL query explorer shown in the image below, provides a simpler interface to the DBpedia SPARQL endpoint. The image below shows both the query and the result returned.

Other SPARQL endpoints such as http://demo.openlinksw.com/sparql/ (shown below) can query DBpedia by specifying the FROM NAMED clause to describe the RDF dataset. E.g.

SELECT ?abstract
FROM NAMED <http://dbpedia.org>
WHERE {
{ <http://dbpedia.org/resource/Civil_engineering> <http://dbpedia.org/ontology/abstract> ?abstract.
FILTER langMatches( lang(?abstract), ‘en’) }
}

Other Related DBpedia Articles

RDF as self-describing Data uses DBpedia and its SPARQL support to show how RDF is essentially ’self-describing’ – there is no need to know about traditional metadata (schemas) before exploring a data set.

Linking to DBpedia with TopBraid outlines the benefit of DBpedia in terms of providing relatively stable URIs for all relevant real-world concepts, thus making it a natural place to connect specific domain models with each other using the OWL built in propery owl:sameAs ( This property indicates that two URI references actually refer to the same thing ). TopBraid Composer provides support to link domain models with DBpedia .

Querying DBpedia provides examples of using SPARQL to query DBpedia.

Adding Semantic Markup to Your Rails Application with DBpedia and ActiveRDF and
Get Semantic with DBPedia and ActiveRDF describe using ActiveRDF to integrate DBpedia resources into web based applications. ActiveRDF is a library for accessing RDF data from Ruby and Ruby On Rails programs and can perform SPARQL queries.

Linking to New Zealand Legislation

Saturday, January 12th, 2008

The web page Public Access to Legislation – Creating links to the New Zealand Legislation website gives information on how to link to New Zealand Legislation.

The legislative documents are identified by:

  • the information type (Act, Regulation, Bill, SOP)
  • the legislation type or category (public, local, members, government, imperial etc)
  • the year
  • the number, padded with initial zeros to 4 digits. For Bills, the number will also include the Bar number and split letter (if applicable).

And a legislative document can currently be linked to in the following ways:

In the same way that I want to link to photo sharing sites from within my web application there will be occasions when I want to link to legislation, standards and regulation documents.

For example in the context of a web based building project it could be useful to link to the Building Act 2004 Table of Content which gives an overview of the individual sections of the Building Act.

This is useful as a general reference but there will be occasions where I want to show a provision in a specific context relevant to the project. For example a building project needs to be issued with a building consent which can lapse after a period of time.

When showing the status of a project which has not yet started building it would be useful to indicate if its building consent is about to expire and if it is then link to the relevant provision to clarify the situation.

Currently there are two simple ways of linking to the specific provision, open it in the same page or open it in a new page.

Both of these approaches are a bit rough for todays modern Ajax-based web applications which would ideally take a smoother approach. I.e. take just the relevant content and slide it into the page at the required location, in this case inserting just the following:

“A building consent lapses and is of no effect if the building work to which it relates does not commence within—
(a) 12 months after the date of issue of the building consent; or
(b) any further period that the building consent authority may allow.”

This Ajax insertion can be achieved by first using a customized HTML reader which extracts the relevant content from the original provisions page.

The simpler display rendered by the customized HTML reader would also be more appropriate for a mobile phone based web application.

Note that in January 2008, as part of the PAL Project, a new site for accessing New Zealand legislation will be available.

The PAL Project stores the legislation documents as XML fragments that are combined for publication as HTML and PDF. It is likely that the documents will also be available as XML.

If the XML document is available then it should be simpler to access the content of the specific provisions when using the customized HTML reader discussed above.

A further simplification would be to provide a REST based web service for accessing the provisions. This would allow the content of the provision “Lapse of Building Consent” to be accessed via a URI similar to the following http://www.legislation.govt.nz/act/public/2004/se/072se52.xml.