Implementing the Symplectic API

We’ve made real progress implementing the Symplectic API which I hope will help motivate academic staff to update and maintain their Symplectic profile and, who knows, perhaps even encourage them to upload full-text to the repository! Kudos to web-developer Mike Taylor who has done all the clever stuff (though this summary reflects my understanding so may contain errors!)

As can be seen in the screen-shot below, Mike has been able to submit a query to the API (using Leeds Met username as a parameter) and differentially parse the resulting XML by publication type including, where available, links to DOI and full-text in the repository (currently labelled as Public URL). Symplectic also has the option to “favourite” records which is flagged in the XML and which we’ve use to identify “Selected publications” in order to give academics greater control over their profile (there is also a “make invisible” option to prevent specific records being exposed from the API.)

The next step will be to liaise with the corporate web-team to explore how the feed can be embedded in the institutional CMS. We’ve already picked a few brains and it shouldn’t be too difficult though there are still one or two technical issues including how best to submit a query; we wouldn’t want to use username as that would be a privacy issue and the preference would be email address though this will require a layer of translation from email address (which isn’t searchable)to either Leeds Metropolitan username or Symplectic internal user id. In addition, the API isn’t designed to be hammered dynamically so results need to be cached so there are questions how best to refresh that cache to reflect changes that academics may wish to make to their profile.

Advertisement

An institutional tangram – musings on developing an integrated research management system

“The tangram (Chinese: 七巧板; pinyin: qī qiǎo bǎn; literally “seven boards of skill”) is a dissection puzzle consisting of seven flat shapes, called tans, which are put together to form shapes. The objective of the puzzle is to form a specific shape (given only an outline or silhouette) using all seven pieces, which may not overlap.”

http://en.wikipedia.org/wiki/Tangram

Having implemented an institutional repository at Leeds Metropolitan and learning by experience some of the difficulties associated with advocacy around the use of that repository (both for OA research and OER) I have become all too aware “that repositories are ‘lonely and isolated’; still very much under-used and not sufficiently linked to other university systems”. So said JISC’s Andy McGregor at an event called “Learning How to Play Nicely: Repositories and CRIS” in May 2010 at Leeds Metropolitan (see my report for Ariadne here). This quote is still relevant, though  perhaps a little less so than when I heard it nearly 2 years ago, thanks to the ongoing work of JISC and particularly the RSP. In any case, the event was a revelation for me and I have coveted a so called Current Research Information Management systems (or CRIS for short) ever since!

And now, in Symplectic Elements, I have one…or at least the components of one (click on image for full size.)

The finished tangram? (click on image for full size)

It’s a puzzle though. A tangram if you will…one with considerably more than seven pieces:

intraLibrary, Symplectic, institutional website, University Research Office (URO), faculty research administrators, The Research Excellence Framework (REF), academic staff, web-developers, bibliographic information, research outputs, Open Educational Resources (OER)…

In fact, this may well not be all the pieces…pretty sure a few have been pushed down the back of the settee. I’ll look for them later.

Anyway, tortured metaphors aside, I have become increasingly aware that working in a large institution, in a role that encompasses technology and institutional policy (though I’m not, by any means, a policy maker…or indeed a real techie) is largely about communication and getting the right people, with the right skills, in the right place at the right time! Absorb policy and technical requirements from senior stakeholders and communicate those requirements to the proper techies – while also trying to ensure any motivating passions of one’s own don’t get lost along the way – Open Access to research and Open Education in my case.

For various reasons, individual user accounts have never been implemented for our repository and historically it has been administered centrally from the Library. In Symplectic we now have a system that is populated with central HR data; all staff will have an account they can access with their standard user name and password from where they can manage their own research profile including uploading full-text outputs directly to the repository*. In addition, administration by the University Research Office and faculty research administrators will be more easily centralised (particularly for the REF).

* In actual fact this functionality is not yet available in lieu of development work from Intrallect to capture the Atom feed from Symplectic and transform with XSLT to a suitable format for intraLibrary. I think.

One of the clever bits of functionality used to sell the software is automatic retrieval of bibliographic data from online citation databases – we are currently running against various APIs, Web of Science (lite), PubMed and arXiv – but I think this may actually be a bit of a red-herring for an institution like Leeds Metropolitan – at least until more (preferably free) data sources are available (JournalToCs API please!); early testing has shown, at best, it will only retrieve a subset of (the types of) outputs that we will need to record and it will be necessary to manually import existing records (e.g. EndNote) as well as implementing other administrative procedures at faculty level to capture information at the point of publication, especially for book-items, monographs, conference material, reports and grey literature.

More important, I think, to ensure that academic staff actually engage with the software rather than just seeing it as a tool for administrators, is to re-use the data to generate a list of research outputs – a dynamic bibliography – on a personal web-profile which has the potential to dramatically increase the visibility of research including Open Access to full-text.

Developing staff profiles of this type has been something of an obsession of mine for a while; we explored doing so from the repository (using SRU and email address as a Unique Identifier) and did develop a working prototype. Symplectic, however, integrated with central HR data and with its more sophisticated API, should make it much easier, at least from a technical perspective, and we are currently liaising with the central web-team to develop something similar to this example from Keele University – http://www.keele.ac.uk/chemistry/staff/mormerod/ (like us, Keele run Symplectic alongside intraLibrary.)

N.B. From the Symplectic interface, a user is able to “favourite” a research record and a flag comes out in the xml from the API which I understand is used on this page to display “Selected Publications”. DOI is also available from the API to link to the published version and if a user uploads full-text to the repository from Symplectic, this link is also in the xml – the first two records on this page include links to the full-text in Keele’s intraLibrary repository.

Our own Library web-dev Mike Taylor has been looking at the Symplectic API in detail and has put together a couple of prototype pages on a development server and after a meeting this week with a representative of the central web-team I’m reasonably confident we can move forward with this work fairly quickly…though there’s still a bit of a chicken & egg situation in populating the Symplectic database to then be re-surfaced via the API in this way.

There is also the question of whether we might alter our repository policy to become full-text only; one limitation of repositories across UK HE from an original conception (in the arXiv mould) of holding, disseminating and preserving full-text research outputs, is that they have in effect become “diluted” by metadata records for which it has not (yet) been possible to procure full-text or copyright does not permit deposit and “hybrid” repositories like ours, of full-text and metadata typically contain more metadata records than full-text (see figures from the RSP survey here). As I have argued on the UKCoRR blog, I think is makes sense to separate a bibliographic database (in Symplectic) from full-text only in a repository.

N.B. As Symplectic does not have the same search functionality as the repository, this approach has the potential disadvantage that it makes it more difficult to search across the entire corpus of research records (though one potential solution may be along the lines of that implemented by City Research Online which, in my view is rapidly becoming an exemplar of a research management system (Symplectic) + full-text repository (EPrints). Another good example is  St Andrews (PURE + DSpace) who presented a case study at “Learning How to Play Nicely: Repositories and CRIS” (video here.)

And what of OER? Along with our EasyDeposit SWORD interface, using OER to resource the refocus the undergraduate curriculum and the soon to be released intraLibrary 3.5 that will enable us to harvest OER from other repositories…for now I think they may be the bits down the back of the settee…

Infrastructure schematic (1st draft)

There are several significant developments that will impact on our repository / research management / OER dissemination and discovery over the next 12 months or so…briefly these are:

This is a quick schematic of how the developing infrastructure might look (a bit big to fit in my WordPress theme so click on image for full size):

Plugged-in for OER

As mentioned in this recent post I’ve been experimenting with WordPress for presenting OER and have been testing a pre-release version of a WordPress plug-in, developed by the Triton project at the University of Oxford to facilitate a dynamic collection of OER in a WordPress blog.

Developer @patlockley describes the overall functionality of the plug-in here and also covers some of the limitations posed by the broader OER infrastructure here emphasising that “no standard API exists across repositories so as to facilitate a single approach to aggregation for an aggregation creator” – as well as a seperate post here considering limitations of the WordPress platform itself used in this context and associated technical considerations.

In summary the plug-in searches Xpert, Merlot and OER Commons (via their API) as well as Wikipedia, Wikibooks and Wikiversity for openly licensed material; Mendeley for journals and with options to add RSS feeds for blogs and podcasts.

Here I’ll briefly describe my experiences of using the plug-in – fairly candid in the hope that it will be useful feedback to Pat and Triton albeit with the initial caveat that any issues I’ve encountered are just as likely to be associated with my limited experience of WordPress and my shambrarian status (I simply haven’t had time to hone the search terms as carefully as I would like) as with the plug-in itself (which of course is pre-release.)

Once installed, famously straightforward in WordPress even prior to release (via FTP), you get a new “Dynamic Collection” tab in the dashboard where I can add a new collection…pretty much at random, I chose an undergraduate course from Leeds Met – Civil Engineering – around which to build my dynamic collection – it’s then just a matter of adding title and search terms, updating the feeds from the three source repositories and publishing:

This admittedly unsophisticated search returned 9 results:

Obviously the plug-in is only as effective as the keyword data / api / source repository(ies) that it is using and the fifth link here actually points at an entirely different resource (in Jorum) with no relevance to Civil Engineering, presumably due to an error at some point along it’s, er, conjugation – as the plug-in does not search Jorum directly this must have come via Xpert which does harvest Jorum. While experimenting with the plug-in I’ve also had instances where links have returned 404s or been otherwise broken so one requirement I think would be the option to remove links from the collection that are incorrect, broken…or simply less relevant; to allow the WordPress administrator fuller control of the collection.

In order to add a blog or podcast under the Settings tab, the plug-in has installed several new tabs (I don’t think the Feed management / Collection statistics / Collection tabs are yet fully functional in the version I am testing):

Under the Dynamic Collection Options there are fields to add rss feeds from blogs or podcasts:

I’ve experienced a few teething troubles adding blogs not least because I don’t know much about Civil Engineering! As I understand, it should search blog title and description for the dynamic collection keywords…I added a feed from http://www.civilengineering.co.uk/feed/ which returned this single (most recent) post – http://www.civilengineering.co.uk/2010/09/civil-engineering-issues/ (the blog, in fact, only appears to comprise 2 posts so presumably would update should any new posts be added?)

I’m very optimistic about the potential of this approach to allow WordPressing course leaders, perhaps with support from learning technologists, to quickly and easily assemble a dynamic collection of OER for their students and look forward to the formal release of the finished product* – in the meantime, in true Blue Peter stylee, here are a number of collections that Pat made earlier to give a sense of what should be possible:

http://politicsinspires.org/dynamic_collection/political-theory/

http://politicsinspires.org/dynamic_collection/comparative-government/

http://politicsinspires.org/dynamic_collection/international-relations/

http://politicsinspires.org/dynamic_collection/european-politics-and-society/

* The only caveat from my perspective is that my own institution does not formally support the use of WordPress, nevertheless, there is certainly a requirement, explicitly identified by senior stakeholders,  to develop tools to cross-search Open Educational Resources and, in this context, I think we can learn a lot from the Triton project.

N.B. Such a mechanism, however implemented via the proliferation of OER repositories and their APIs, also put me in mind of this post from Suzanne Hardy (@glittrgirl) of MEDEV and the PORSCHE project – Branding, repositories, OER and awareness raising: some thoughts on embedding OERs

See also: Delores OER – WordPress for hosting and describing learning resources (University of Bath and Heriot-Watt)

BiblioSight project recommended for funding

We’ve just learned that we’ve been successful in our most recent funding bid to JISC’s Rapid Innovation call.

Outline project description:

“The project will aim to exploit the Web of Science Web Services API that uses standard transport protocols, such as HTTP, and message formats, such as SOAP and XML, to facilitate the exchange of data between Web of Knowledge and a custom application. It will build on work undertaken by the JISC funded SUE project, Implementing an Institutional Repository for Leeds Metropolitan University to integrate bibliographic information from Web of Science into the Leeds Met Open Access repository of research; this will facilitate automatic update when a published article appears in Web of Science. The aim is to integrate the technology into an efficient workflow to populate the repository with citation information / full text; we will also build on work undertaken by the JISC funded PERSoNA project and aim to develop a ‘widget’ that can easily be added to a personal environment like iGoogle or personal/communal environment like netvibes and that will extract bibliographic information – and potentially also bibliometrics – for authenticated Leeds Met staff in line with Web of Science licensing.”

Development of Research Repository Aspect of IntraLibrary

On Friday Mike and I visited colleagues at Keele University for a meeting with Charles Duncan from Intrallect to consider development priorities for intraLibrary to better serve our needs as a research repository.  Over 4 and a half hours we considered the basic issues that need addressing as well as looking forward to some more ambitious functionality and integration with the wider research infrastructure as we move towards the REF.

I was particularly interested to learn about how Keele are implementing Symplectic’s publications management system – http://www.symplectic.co.uk/ – which regularly trawls Web of Science and PubMed central for information about Keele’s academic publications.  Symplectic have clearly been thinking about integration with IRs and there’s even a link to SHERPA/RoMEO.  The system was used at Imperial College London for the RAE 2008 process and includes link functionality with DSpace which is that institution’s IR platform – http://spiral.imperial.ac.uk/.  Intrallect are currently liaising with Symplectic about integration with intralibrary – I’m not certain precisely what form this would take but in an ideal world it would be great if we could auto populate as much metadata as possible (title/bibliographic info/abstract/author/copyright status according to RoMEO) and automatically nudge academics for full text where appropriate!

At Leeds Met we currently lack any form of research database which is why I’ve been exploring what are essentially manual workflows to populate the repository with all research output – I’m not sure how expensive Symplectic is and it may be difficult to justify given this institution’s relatively small research output and the repository may well have to be the research database which is the assumption I’ve been working on; we will also want to explore the soon-to-be-released Web of Science API which may, in any case, enable us to emulate some of this functionality ourselves.

The first item on our agenda was somewhat more prosaic and focussed on our immediate functional requirements – SRU searching and metadata.  Mike has been working on incorporating advanced search into the SRU interface and come up against a couple of issues when searching by author and date which are essentially artifacts of having to query DC rather than LOM; in the LOM, creators and contributors are clearly differentiated, however, querying by DC conflates creator and author roles which may (will) be different if resources are uploaded by someone other than the author.

  • Searching dc.creator will search for the creator and author roles
  • Searching dc.contributor will search for the content provider role

In addition:

  • Searching by dc.date only searches data that relates to the intraLibrary submission process (i.e. the deposit date, and perhaps modification dates if you added an author later on for example)
  • The only way to search journal dates is to use the default free text search that searches everything (or most fields anyway).

The solution, of course, is to make it possible to query the LOM by SRU and this is now Intrallect’s intention – indeed, to render all LOM fields query-able which would include user generated tags for example.

The next big question is exposure of open content to search engines and Charles gave us an overview of plans to develop an object “home page” with a static URL which should help in this area.  We also discussed sitemaps and what need to be done external to intraLibrary.  I’m still unclear on how we can improve the format of results returned by Google from the SRU interface; to repeat, Google IS indexing http://repository.leedsmet.ac.uk/ with site: http://repository.leedsmet.ac.uk/ currently returning over 500 records.  However this is fairly unstructured; Google is simply following links from http://repository.leedsmet.ac.uk/main/browse.php; any subsequent links Googlebot encounters are also indexed and returned as “The Repository search for [link name]” and ideally I’d like results to be returned in a more structured and user friendly form.   Many queries actually return no results where there is (yet) no content to find though where there is content, Google is indexing all human readable metadata.  I’m also not certain whether Googlebot is finding its way into the full text via the Open URL/virtual file paths generated by intraLibrary.  Full text indexing within intraLibrary itself has also been promised.

In short, I’m really not sure how all of these factors may combine to be exploited by a next generation SRU interface!

We then touched upon self-archiving and (semi) mediated workflows; potentially developing SWORD based quick deposit from desktop/web, ideally with automatic metadata generation.

The two other major issues we considered are:

  • Policy metadata – handling embargoes

This is pretty crucial to an OA archive of research as many publishers of academic journals specify an embargo period of 12 or 18 months from the date of publication before a paper can be made available in a repository.  We need to be able to add a paper to intraLibrary upon receipt but restrict access until the embargo has expired and for this to happen automatically.  On one level, this functionality should be fairly straightforward to achieve by having intraLibrary check today’s date against an embargo date specified in the metadata; it’s a little more complicated than that though as we would want the metadata to be visible before the embargo date, just not the full text.

  • Cover pages for PDF

It was suggested that a coversheet should be generated by intraLibrary on the fly which would certainly be useful as manually creating cover sheets for each and every article is time consuming to say the least; this would be useful functionality for CLA materials which also require a coversheet.

These developments will take some time to implement and the next stage is to prioritise – by anonymous e-postal ballot – Intrallect hope we will start to see some of the major initiatives in a build towards the end of the year.

Thank you to our colleagues at Keele for making us welcome and for feeding us!