Using Groovy to Upload RDF files to the Talis Platform

The Talis Platform provides free stores for developers to host RDF data online. Each store has its own SPARQL end point for querying the RDF data.

Options for uploading individual RDF files into a store include:

A nice to have option would be to be able to upload all the RDF files found in a directory directly into a store using a simple command like TalisStore.load.

Groovy with its flexible scripting is a good candidate for this type of work. Code like the following makes it easy to traverse directories and list the RDF files

  • in the current directory:
    <br />
    new File(&quot;.&quot;).eachFileMatch(~/.*\.rdf/) { println it }<br />
    
  • or in a specific directory:
    <br />
    new File(&quot;/data/rdf&quot;).eachFileMatch(~/.*\.rdf/) { println it }<br />
    

Once Groovy is installed the above lines of code can be run directly in both the Groovy Shell (groovysh) and the Groovy Console (groovyConsole). For example when run in the Groovy Shell (groovysh) :

<br />
$ groovysh<br />
Groovy Shell (1.6.4, JVM: 1.6.0_15)<br />
Type 'help' or '\h' for help.<br />
-------------------------------------------------------------------------------------<br />
groovy:000&gt; new File(&quot;.&quot;).eachFileMatch(~/.*\.rdf/) { println it }<br />
./WO0002.rdf<br />
./WO0003.rdf<br />
./WO0004.rdf<br />
./WO0005.rdf<br />

The Groovy RESTClient simplifies REST operations like POSTing (uploading) files to a web site. It is an extension of HTTPBuilder which in turn is a wrapper of Apache’s HttpClient. The main addition required for the RESTClient to upload RDF/XML files to a Talis store is an “application/rdf+xml” encoder. This is easy to create following the example provided in the article Groovy RESTClient and Putting Zip Files.

The result is the encodeRDF method shown below.

<br />
import groovyx.net.http.RESTClient<br />
import org.apache.http.entity.FileEntity<br />
TalisStoreLoader() {<br />
 talis = new RESTClient( &quot;http://api.talis.com/&quot; )<br />
 talis.auth.basic TALIS_USERNAME, TALIS_PASSWORD<br />
 talis.encoder.'application/rdf+xml' = this.&amp;encodeRDF<br />
 }<br />
def encodeRDF( Object data ) throws UnsupportedEncodingException {<br />
 if ( data instanceof File ) {<br />
 def entity = new FileEntity( (File) data, &quot;application/rdf+xml&quot; );<br />
 entity.setContentType( &quot;application/rdf+xml&quot; );<br />
 return entity<br />
 } else {<br />
 throw new IllegalArgumentException(<br />
 &quot;Don't know how to encode ${data.class.name} as application/rdf+xml&quot; );<br />
 }<br />
 }<br />

The line talis.encoder.’application/rdf+xml’ = this.&encodeRDF registers it with an instance of the RESTClient.

With the RDF encoder in place a file can be uploaded to a stores metabox as follows.

<br />
def res = talis.post( path: metaboxPath, body: file, requestContentType: &quot;application/rdf+xml&quot; )</p>
<p>

This functionality is encapsulated in the class com._3kbo.talis.TalisStoreLoader which is part of a maven project available for download as a zip file. It includes the script TalisStore.groovy which is a simplified wrapper of com._3kbo.talis.TalisStoreLoader.

The jar file create by the project talis-store-0.2.jar can be downloaded separately.

The RESTClient is not bundled with the standard Groovy install. Trying to access it from the shell or console without explicitly installing it will results in errors like the following:

<br />
groovy:000&gt; import groovyx.net.http.RESTClient<br />
ERROR org.codehaus.groovy.tools.shell.CommandException:<br />
Invalid import definition: 'import groovyx.net.http.RESTClient';<br />
reason: startup failed, script1266050039289.groovy:<br />
1: unable to resolve class groovyx.net.http.RESTClient<br />
 @ line 1, column 1. 1 error at java_lang_Runnable$run.call (Unknown Source)<br />

Installing the RESTClient requires downloading HTTPBuilder and adding it and its dependencies (http-builder-xxx-all.zip) to the ${user.home}/.groovy/lib directory. Also add talis-store-0.2.jar to this directory. The ${user.home}/.groovy/lib directory may need to be created manually but the Groovy install should have created a file named “$GROOVY_HOME/conf/groovy-starter.conf” containing the line

load ${user.home}/.groovy/lib/*

which enables the loading of the additional jar files required by RESTClient plus the com._3kbo.talis.TalisStoreLoader i.e:

  • http-builder-0.5.0-RC2.jar
  • httpclient-4.0.jar
  • httpcore-4.0.1.jar
  • json-lib-2.3-jdk15.jar
  • xml-resolver-1.2.jar
  • commons-collections-3.2.1.jar
  • commons-logging-1.1.1.jar
  • talis-store-0.2.jar

Using the Groovy Shell to Upload

With the RESTClient and the talis-store-0.2.jar installed the Groovy Shell (groovysh) makes it easy to run the TalisStore.groovy script and upload either individual RDF files or all the RDF files in a directory to a Talis store.

The four options for running the TalisStore.groovy script are:

  1. TalisStore.load “mystore”,”user”,”password”,”file_or_directory”
  2. TalisStore.load “mystore”,”user”,”password”
  3. TalisStore.load “file_or_directory”
  4. TalisStore.load()

The first and second options both explicitly set the store, user and password. The first option also nominates either a specific RDF file to upload or a directory to scan and upload all the RDF files found. The second option uploads all the RDF files found in the current directory, i.e. the directory in which the Groovy Shell (groovysh) was invoked.

The third and forth options read the store, user and password from the configuration file TalisConfig.groovy, updated for a specific store and available on the classpath (see below).

With the configuration file TalisConfig.groovy in place uploading a specific RDF file or a directory simplifies to TalisStore.load “file_or_directory”

Uploading the RDF files in the current directory is just TalisStore.load() as shown in the example
Loading all RDF files from the current directory below.

Using the Script to Upload

Adding the line #!/usr/bin/env groovy to the TalisStore.groovy script and making the script executable allows it to be run independent of the Groovy Shell (groovysh), for example ./TalisStore.groovy /sioc/forum/WO0902.rdf explicitly loads the RDF, using the configuration file to set the store, user and password.

See the TalisStore.groovy javadoc for more details on running as an executable script.

Summary

There is a bit of configuration to set everything up but once in place the combination of Groovy, the RESTClient and the TalisStore loader code described here makes it easy to load RDF files to the Talis Platform.

My preference is to run the Groovy Shell (groovysh) and use simple commands like TalisStore.load().

Possible extensions for the future include commands like TalisStore.sparql.select etc…

Appendix A: Examples

Loading a specific file

<br />
$ groovysh<br />
Groovy Shell (1.7.1, JVM: 1.6.0_15)<br />
Type 'help' or '\h' for help.<br />
-------------------------------------------------------------------------------<br />
groovy:000&gt; TalisStore.load &quot;mystore&quot;,&quot;user&quot;,&quot;password&quot;,&quot;/sioc/WO0401.rdf&quot;<br />
Using store: mystore user password<br />
Loading a file or directory: /sioc/WO0401.rdf<br />
Loading /sioc/WO0401.rdf<br />
Loaded 1565688 bytes in 58518 milliseconds. (Status: 204)<br />

Loading all RDF files from the current directory

<br />
$ cd /scoop/forum/<br />
$ ls -l<br />
-rw-r--r--  1  3847192  2 Jan 12:11 WO0903.rdf<br />
-rw-r--r--  1  2485605  2 Jan 12:11 WO0904.rdf<br />
-rw-r--r--  1  2321233  2 Jan 12:12 WO0905.rdf<br />
-rw-r--r--  1  2551787  2 Jan 12:12 WO0906.rdf<br />
$ groovysh<br />
Groovy Shell (1.7.1, JVM: 1.6.0_17)<br />
Type 'help' or '\h' for help.<br />
--------------------------------------------<br />
groovy:000&gt; TalisStore.load()<br />
Classpath:<br />
...<br />
Loading RDF files from directory /scoop/forum/.<br />
2010-03-14 11:32:31.477: Loading /scoop/forum/./WO0903.rdf<br />
2010-03-14 11:33:49.289: Loaded 3847192 bytes in 77808 milliseconds. (Status: 204)<br />
2010-03-14 11:33:49.304: Loading /scoop/forum/./WO0904.rdf<br />
2010-03-14 11:34:38.288: Loaded 2485605 bytes in 48984 milliseconds. (Status: 204)<br />
2010-03-14 11:34:38.289: Loading /scoop/forum/./WO0905.rdf<br />
2010-03-14 11:35:25.429: Loaded 2321233 bytes in 47140 milliseconds. (Status: 204)<br />
2010-03-14 11:35:25.43: Loading /scoop/forum/./WO0906.rdf<br />
2010-03-14 11:36:15.952: Loaded 2551787 bytes in 50523 milliseconds. (Status: 204)<br />
Loaded 4 files in 224488 milliseconds.<br />
===&gt; 4<br />
groovy:000&gt;<br />

Appendix B: Adding the Groovy Configuration File to the Classpath

The structure of the config file is:

<br />
// TalisConfig.groovy<br />
talis {<br />
    user = &quot;myusername&quot;<br />
    password = &quot;mypassword&quot;<br />
    store = &quot;mystore&quot;<br />
}<br />

Once the values have been updated for a specific store the steps for adding to the classpath and also verifying that it is being read correctly are as follows:

  • Create a directory to hold property files ( e.g. . ${user.home}/.groovy/conf/ ) and
  • Add a matching line to “$GROOVY_HOME/conf/groovy-starter.conf” to add the directory to the classpath,e.g. load ${user.home}/.groovy/conf/./
  • Place the Groovy configuration file TalisConfig.groovy in the directory (i.e. ${user.home}/.groovy/conf/)

ConfigSlurper is used to read the configuration file. The shell input below shows how to:

  • Check what is on the classpath using loader.URLs.each{ println it }
  • Get the config file using url = loader.getResource(”TalisConfig.groovy”)
  • Read the config file using def config = new ConfigSlurper().parse(url)

<br />
groovy:000&gt; import groovyx.net.http.RESTClient<br />
===&gt; [import groovyx.net.http.RESTClient]<br />
groovy:000&gt; talis = new RESTClient( &quot;http://api.talis.com/&quot; )<br />
===&gt; groovyx.net.http.RESTClient@1798928<br />
groovy:000&gt; loader = talis.class.classLoader.rootLoader<br />
===&gt; org.codehaus.groovy.tools.RootLoader@4d20a47e<br />
groovy:000&gt; loader.URLs.each{ println it }<br />
file:/Users/richardhancock/./<br />
file:/Users/richardhancock/groovy-1.6.4/lib/ant-1.7.1.jar<br />
file:/Users/richardhancock/groovy-1.6.4/lib/ant-junit-1.7.1.jar<br />
file:/Users/richardhancock/groovy-1.6.4/lib/ant-launcher-1.7.1.jar<br />
file:/Users/richardhancock/groovy-1.6.4/lib/antlr-2.7.7.jar<br />
file:/Users/richardhancock/groovy-1.6.4/lib/asm-2.2.3.jar<br />
file:/Users/richardhancock/groovy-1.6.4/lib/asm-analysis-2.2.3.jar<br />
file:/Users/richardhancock/groovy-1.6.4/lib/asm-tree-2.2.3.jar<br />
file:/Users/richardhancock/groovy-1.6.4/lib/asm-util-2.2.3.jar<br />
file:/Users/richardhancock/groovy-1.6.4/lib/bsf-2.4.0.jar<br />
file:/Users/richardhancock/groovy-1.6.4/lib/commons-cli-1.2.jar<br />
file:/Users/richardhancock/groovy-1.6.4/lib/commons-logging-1.1.jar<br />
file:/Users/richardhancock/groovy-1.6.4/lib/groovy-1.6.4.jar<br />
file:/Users/richardhancock/groovy-1.6.4/lib/ivy-2.1.0-rc2.jar<br />
file:/Users/richardhancock/groovy-1.6.4/lib/jline-0.9.94.jar<br />
file:/Users/richardhancock/groovy-1.6.4/lib/jsp-api-2.0.jar<br />
file:/Users/richardhancock/groovy-1.6.4/lib/junit-3.8.2.jar<br />
file:/Users/richardhancock/groovy-1.6.4/lib/servlet-api-2.4.jar<br />
file:/Users/richardhancock/groovy-1.6.4/lib/xstream-1.3.1.jar<br />
file:/Users/richardhancock/.groovy/lib/http-builder-0.5.0-RC2.jar<br />
file:/Users/richardhancock/.groovy/lib/httpclient-4.0.jar<br />
file:/Users/richardhancock/.groovy/lib/httpcore-4.0.1.jar<br />
file:/Users/richardhancock/.groovy/lib/json-lib-2.3-jdk15.jar<br />
file:/Users/richardhancock/.groovy/lib/xml-resolver-1.2.jar<br />
file:/Users/richardhancock/.groovy/conf/./<br />
===&gt; [Ljava.net.URL;@3ebc312f<br />
groovy:000&gt; url = loader.getResource(&quot;TalisConfig.groovy&quot;)<br />
===&gt; file:/Users/richardhancock/.groovy/conf/TalisConfig.groovy<br />
groovy:000&gt; def config = new ConfigSlurper().parse(url)<br />
===&gt; {talis={username=myusername, password=mypassword, store=mystore}}<br />
groovy:000&gt;<br />

Appendix C: Using Maven to run the Groovy Script

The TalisStore script can also be run via maven. This approach uses the jar file dependencies defined in the maven¬† project and does not require the standard Groovy install. If a valid “TalisConfig.groovy” configuration file is available on the classpath, the parameters for “store”, “username” and “password” are not required. By default the pom.xml file excludes the dummy configuration file but once it has been updated with real values it can be included by changing the exclude(s) to include(s) .¬† The TalisStore script can be run by executing command lines such as the following which invoke the TalisStore main method (optionally with parameters).

mvn exec:java -Dexec.mainClass=TalisStore

mvn exec:java -Dexec.mainClass=TalisStore -Dexec.args=”/sioc/forum/2007″

Appendix D: Authentication

The method “talis.auth.basic TALIS_USERNAME, TALIS_PASSWORD” is a bit of an anomaly since the Talis Platform uses HTTP Digest Authentication. RESTClient uses the groovyx.net.http.AuthConfigbasic” method which works for “digest” authentication as well.

Comments are closed.