We have quite a rigorous SOA software development process however the full value of the collected information is not being realized because the artifacts are stored in disconnected information silos. So far attempts to introduce tools which could improve the situation (e.g. zAgile Teamwork and Semantic Media Wiki) have been unsuccessful, possibly because the value of a Linked Data approach is not yet fully appreciated.
To provide an example Linked Data view of the SOA services and their associated artifacts I created a prototype consisting of Sesame running on a Tomcat server with Pubby providing the Linked Data view via the Sesame SPARQL end point. TopBraid was connected directly to the Sesame native store (configured via the Sesame Workbench) to create a subset of services sufficient to demonstrate the value of publishing information as Linked Data. In particular the prototype showed how easy it became to navigate from the requirements for a SOA service through to details of its implementation.
The prototype also highlighted that auto generation of the RDF graph (the data providing the Linked Data view) from the actual source artifacts would be preferable to manual entry, especially if this could be transparently integrated with the current software development process. This is has become the focus of the next step, automated knowledge extraction from the source artifacts.
Artifacts
Key artifact types of our process include:
- Service Contracts. A Service Contract is a Microsoft Word document describing the requirements of a service and its operations from a business point of view. Service Contracts are maintained in an Objective document repository and the collection of Service Contracts also serves as a Service Registry.
- A UML analysis model defining the attributes and relationships of the business entities referenced in the Service Contract.
- WSDLs and Xml Schema’s developed from the Service Contracts and the UML analysis model. The WSDLs and Xml Schema’s are maintained in a Subversion repository.
- Java source code providing the service implementation via JAX-WS @WebService and @WebMethod annotations mapped to the WSDL and WSDL operations. The source code is also maintained Subversion.
- Soapui Test Cases validating each WSDL operation.
- WAR files package the JAX-WS web services and EAR files bundle the WAR files for deployment to an Oracle Weblogic Application Server.
- Configuration files for the Oracle Weblogic Application Server specifying the databases, JMS Message Queues and any external web services needed. The configuration files list the other services that the SOA services depend on. The runtime state of the server can be obtained via JMX.
- HTML pages containing release information such as the release id for a project and the services developed as a part of it. The release information is contained in an Oracle database and published by a Rails app.
- Microsoft Word configuration and release documents describing how to deploy the services to test and production environments. The set of configuration and release documents provides a record linking the source code revisions tagged in Subversion with the specific release of the software. Custom Microsoft Word document properties specify the Subversion URL of the tagged source code and other release related information.
- Defects are managed in the JIRA bug tracking system which also contains project and release information.
- Application log files in XML and text formats record usage of service operations as well as being utilized to diagnose defects. Audit logs (in XML format) identify the consumer of each service operation and summarize the request and response.
- HTML pages describing the people in the organization are generated from information held in LDAP and Active Directory. These pages do not link to the people to the projects they have worked on or the artifacts they have created. This information is usually found in the artifacts themselves.
- TWiki web pages providing semi structure descriptions of projects and development environments.
A Graph of Concepts and Instances
There is a rich graph of relationships linking the things described in the artifacts listed above. For example the business entities defined in the UML analysis model are the subject of the service and service operations defined in the Service Contracts. The service and service operations are mapped to the WSDLs which utilize the Xml Schema’s that provide an XML view of business entities. The JAX-WS implementations are linked to the WSDLs and Xml Schema’s and deployed to the Oracle Weblogic Application Server where the configuration files list the external dependencies. The log files and defects link back to specific parts of the code base (Subversion revisions) within the context of specific service operations. The people associated with the different artifacts can often be determined from artifact meta-data.
RDF, OWL and Linked Data are a natural fit for modelling and viewing this graph since there is a mix of concepts plus a lot of instances, many of whom already have a HTTP representation. Also the graph contains a number of transitive relationships , (for example a WSDL may import an Xml Schema which in turn imports another Xml Schema etc …) promoting the use of the owl:TransitiveProperty to help obtain a full picture of all the dependencies a component may have.
Knowledge Extraction
Another advantage of the RDF, OWL, Linked Data approach is the utilization of unique URIs for identifying concepts and instances. This allows information contain in one artifact, e.g. a WSDL, to be extracted as RDF triples which would later be combined with the RDF triples extracted from the JAX-WS annotation of Java source code. The combined RDF triples tell us more about the WSDL and its Java implementation than could be derived from just one of the artifacts.
We have made some progress with knowledge extraction but this is still definitely a work in progress. Sites such as ConverterToRdf, RDFizers and the Virtuoso Sponger provide tools and information on generating RDF from different artifact types. Part of the current experimentation is around finding tools that can be transparently layered over the top of the current software development process. Finding the best way to extract the full set of desired RDF triples from Microsoft Word documents is also proving problematic since some natural language processing is required.
Tools currently being evaluated include:
- GATE for processing Microsoft Word, including natural language processing for extracting information from the document text. Some success but still an experiment. Other options still to investigate for processing Word documents include the Virtuoso Sponger and exporting the documents as XML for easier parsing of the custom properties and table data.
- TopBraid Composer and EulerGUI for converting UML and Xml Schemas to OWL. (See Converting UML Models to OWL – Part 1: The Approach)
- XSD2OWL for converting Xml Schemas to OWL.
- Groovy XmlSlurper for parsing the WSDLs, Xml Schemas, the Oracle Weblogic Application Server configuration files, Soapui projects and XML based audit logs.
- Javassist for reading JAX-WS annotations. (Still a work in progress but looks promising.)
- Java RDFizer for scanning bytecode and discovering information about classes
- Maven POM RDFizer for converting the metadata of a Maven repository into RDF
- D2RQ for mapping relational databases to RDF
- svn2rdf for generating RDF from Subversion commits
- jira2rdf for transforming JIRA bug reports and issue tracking events into RDF
- Squirrel RDF for mapping LDAP to RDF.
The Benefits of Linked Data
The prototype showed the benefits of Linked Data for navigating from the requirements for a SOA service through to details of its implementation. Looking at all the information that could be extracted leads on to a broader view of the benefits Linked Data would bring to the SOA software development process.
One specific use being planned is the creation of a Service Registry application providing the following functionality:
- Linking the services to the implementations running in a given environment, e.g. dev, test and production. This includes linking the specific versions of the requirement, design or implementation artifacts and detailing the runtime dependencies of each service implementation.
- Listing the consumers of each service and providing summary statistics on the performance, e.g. daily usage figures derived from audit logs.
- Providing a list of who to contact when a service is not available. This includes notifying consumers of a service outage and also contacting providers if a service is being affected by an external component being offline, e.g. a database or an external web service.
- Search of the services by different criteria, e.g. business entity
- Tracking the evolution of services and being able to assist with refactoring, e.g answering questions such as “Are there older versions of the Xml Schemas that can be deprecated?”
- Simplify the running of a specific Soapui test case for a service operation in a given environment.
- Provide the equivalent of a class lookup that includes all project classes plus all required infrastructure classes and returns information such as the jar file the class is contained in and JIRA and Subversion information.



