APIs first!

After testing the new SWORD2 endpoint for our new ePrints 3.3 instance, we found that a significant change was needed for the SWORD library. Minor changes included the endpoint, which became …/id/contents instead of …/sword-app/deposit/inbox, and the structure of the XML changing from <eprint> tags to <entry> tags. The main change was the implementation of how the XML was posted. The SWORD library swordappv2-php-library was forked from the github repository so that an XML string could be posted. This was because our current method posted a string, which the endpoint read as a file rather than metadata. So the dataset had the XML attached to it as a file, with no metadata. We have made additions to the library, changing it to post a string of XML metadata rather than a file. This fixed the problem, giving the dataset metadata once posted rather than attaching it as a separate file.

Now heres the main problem. The dataset gets posted to ePrints in a deposited state, which ePrints classes as ‘in Review’. Now, ePrints requires a minimal set of metadata before a dataset can be ‘in review’. But only if the dataset is made manually within ePrints. Not via the API. Over the API, you can post a dataset straight to ‘in review’ without the mandatory set of metadata. Which brings me to the title of this post; APIs first! API driven development would mean that the APIs are built first so this kind of situation would be avoided.

Another problem we came across during the change was that the test account we had for testing deposits no longer existed due to the migration of user accounts skipping it. This is fine, as an unauthorised response should be received on an attempted deposit. This was not the case, as we got an ‘Invalid XML’ response. Which was unusual, as the XML was valid and everything we tried was to no avail. It was by chance that we found the solution, by switching to an account we knew existed and the deposit working as planned. What had happened was that the depositing had failed, due to the account not existing, but the wrong error message being sent back.

So I reiterate; APIs first. Knowing what the response is, and that the functionality of the application works first, is the most important aspect of said application.

SWORDs and Citations

The researcher Dashboard has been expanded to interface with the Lincoln Repository, ePrints. From it, a researcher can deposit their datasets directly to the repository, complete with DOI.

In previous posts, I spoke about how the CKAN and ePrints APIs can interface. We have finally implemented both APIs for use with the Researcher Dashboard and created a useable workflow for depositing datasets from CKAN to ePrints via the dashboard.

The workflow goes as follows:

  1. Hit ‘Publish’
  2. Get latest metadata from CKAN
  3. Prompt user to complete form
  4. Generate DOI
  5. Send metadata to Datacite
  6. Mint DOI
  7. Post SWORD2 to ePrints
  8. Get ePrints ID from response
  9. Add ePrint to SQL database as minimal data
  10. Update dataset in database with ePrint link

When a researcher views their project, they are presented with a list of datasets lifted from the project environment in CKAN. If they want to deposit one into ePrints, they can select the deposit button and are prompted to finish the dataset metadata. ePrints requires a minimal set of metadata before the dataset can be deposited. It can be put into a users inbox with merely a title, but requires a minimal specific set before depositing.

The DOI is minted for a unique identifier, by sending the metadata to Datacite along with the generated DOI. A DOI has to be generated first before it can be minted. Again, this is another field that is input to ePrints via the metadata.

The inclusion of ePrints metadata gives an all in one approach to the Research Dashboard. As otherwise, users would have to go into ePrints and fill in the data there. An annoyance easily avoided by having all the necessary steps taken care of on one site. This completes the toolset, so projects now have a central hub of activity. Data is brought into Orbital via the AMS (Awards Management System) for importing funded projects and CKAN for datasets, and exported to ePrints for the depositing into the Lincoln repository.

The original plan for this workflow was published by Paul Stainthorp. The workflow as it stands currently is as written in this post. It is, however, still in the finishing stages and polishing to make sure the process is solid.

CKAN and ePrints APIs

For each application that Orbital interfaces with, be it CKAN, ePrints or anything else, it is abstracted through a ‘bridge_application’ library. Orbital is built predominately in PHP. Using CKAN as an example, we have a Ckan.php file in the folder ‘bridge_applications’ containing all the functions needed to interface with CKAN. If one of the functions it contains is needed, it is called on the page where the result of the function is used.

If a dataset is read, it can be stored as a variable, as the function returns an object. It can be output to the page in Orbital to show what the dataset contains, or saved to a variable to used with another function.

Example:

$this->load->library(‘../bridge_applications/ckan’);
$datasets = $this->ckan->read_datasets();

$datasets is set to the result of the ckan function. What it is set to depends on the datasets in CKAN. In this example, it returns:

array(1) {
  [0]=>
  object(Dataset_Object)#362 (6) {
    ["_title":protected]=>
    string(11) "********"
    ["_uri_slug":protected]=>
    string(38) "********"
    ["_creators":protected]=>
    array(1) {
      [0]=>
      string(17) "********"
    }
    ["_subjects":protected]=>
    array(0) {
    }
    ["_date":protected]=>
    int(1358507313)
    ["_keywords":protected]=>
    array(3) {
      [0]=>
      object(stdClass)#95 (6) {
        ["vocabulary_id"]=>
        NULL
        ["display_name"]=>
        string(12) "********"
        ["name"]=>
        string(12) "********"
        ["revision_timestamp"]=>
        string(26) "2013-01-18T11:16:59.137985"
        ["state"]=>
        string(6) "active"
        ["id"]=>
        string(36) "********"
      }
    }
  }
}

*Some results are starred out.

As this example only includes one dataset, the result is an array with the dataset as its only entrant.

This is converted to the standard format used in Orbital. This standard format is used so that every application Orbital links to has a standard input for data to be sent to. so any application can theoretically talk to any other application through Orbital.

The SWORD library, used for SWORD endpoint data entry into ePrints, takes this standard format as input and formats it to the appropriate format before sending it to the ePrints endpoint. The theory here is the same as before; it is a php library for a bridge application. It takes the data and uses the endpoint to create a record via SWORD.

Example:

$this->sword->create_sword($dataset);

The dataset taken from CKAN is fed into the SWORD library and sent to ePrints to create a new ePrint from the dataset. This is done by using simpleXML to build an XML SWORD compliant object that can be sent via a http curl request to the ePrints SWORD endpoint. The result of this is a new entry in ePrints, via SWORD, from the data retrieved from CKAN.

The code is hosted on GitHub and can be found here:

https://github.com/lncd/Orbital-Bridge/tree/develop/src/application/bridge_applications

The Researcher Dashboard

Orbital has come a long way since the start of the project. Now, with it being the hub of activity around a research project, it makes more sense for the front end to be called the ‘Researcher Dashboard’. Also named because it’s the interface to the researcher.

We have implemented CKAN more fully, with a few users already using its features. The same will soon be the case with the other technologies we are using. What this also means is that we are underway linking them to the Researcher Dashboard so that the centre point of project management can be created. Soon, the project page will have links to any CKAN datasets belonging to it along with the record in the AMS. Importing projects from the AMS is still on the to do list, but will be done once we get AMS access.

Things are coming together development-wise. It’s now a case of sticking things together, rather than building them from scratch. After realising the size of our workload, we managed to finish off a lot of things and hopefully the rest will follow at a similar pace.

CKAN, SWORD and Orbital-Bridge update

In a previous post, I looked at extending CKAN with a SWORD extension to input and output data via SWORD. This had not gone to plan, as numerous difficulties were encountered.

Orbital Bridge has been developing, and since the last post it’s structure has been planned and more clearly laid out. Each technology it will interface with will be done so via a library. Each of these libraries will have the following functions: CREATE, READ, WRITE and DELETE. The CRUD functions are Create, to create something externally from a Bridge Object, Read to get something externally and transform it into a Bridge Object, Write to update something externally from a Bridge Object and Delete to delete something externally specified in the Bridge Object. This ‘Bridge Object’ is a standard format PHP object used in Orbital-Bridge. This is so it can be sent to and from any library to be used however the library wishes. This creates a standard that any new library can use to connect to external technologies. There will also be a fifth function, RECEIVE, which is a different function from the CRUD functions. Receive takes an object sent from a HTTP POST and changes it to a ‘Bridge Object’. More about this will be documented and posted about later when it has been implemented rather than conceptualised.

The main update to the CKAN/SWORD development is that they are now in two libraries, one for interfacing Orbital-Bridge with SWORD (the ePrints SWORD endpoint) and one for interfacing Orbital-Bridge with CKAN. Using the structure of Orbital-Bridge, deposits can be made in ePrints by using the CKAN library to READ and dataset, turning it into a bridge object, then using the SWORD library to  CREATE the object, after turning it into SWORD XML from the Bridge Object, in ePrints. This structure of Orbital-Bridge has changed the way CKAN and SWORD will talk to one another. Originally the SWORD extension of CKAN was just that, a CKAN extension. Since OKFest and talking to CKAN developers, and with the decision on how Orbital-Bridge will be structure, this is seen as the best way to interface the two technologies.