An institutional tangram – musings on developing an integrated research management system

“The tangram (Chinese: 七巧板; pinyin: qī qiǎo bǎn; literally “seven boards of skill”) is a dissection puzzle consisting of seven flat shapes, called tans, which are put together to form shapes. The objective of the puzzle is to form a specific shape (given only an outline or silhouette) using all seven pieces, which may not overlap.”

http://en.wikipedia.org/wiki/Tangram

Having implemented an institutional repository at Leeds Metropolitan and learning by experience some of the difficulties associated with advocacy around the use of that repository (both for OA research and OER) I have become all too aware “that repositories are ‘lonely and isolated’; still very much under-used and not sufficiently linked to other university systems”. So said JISC’s Andy McGregor at an event called “Learning How to Play Nicely: Repositories and CRIS” in May 2010 at Leeds Metropolitan (see my report for Ariadne here). This quote is still relevant, though  perhaps a little less so than when I heard it nearly 2 years ago, thanks to the ongoing work of JISC and particularly the RSP. In any case, the event was a revelation for me and I have coveted a so called Current Research Information Management systems (or CRIS for short) ever since!

And now, in Symplectic Elements, I have one…or at least the components of one (click on image for full size.)

The finished tangram? (click on image for full size)

It’s a puzzle though. A tangram if you will…one with considerably more than seven pieces:

intraLibrary, Symplectic, institutional website, University Research Office (URO), faculty research administrators, The Research Excellence Framework (REF), academic staff, web-developers, bibliographic information, research outputs, Open Educational Resources (OER)…

In fact, this may well not be all the pieces…pretty sure a few have been pushed down the back of the settee. I’ll look for them later.

Anyway, tortured metaphors aside, I have become increasingly aware that working in a large institution, in a role that encompasses technology and institutional policy (though I’m not, by any means, a policy maker…or indeed a real techie) is largely about communication and getting the right people, with the right skills, in the right place at the right time! Absorb policy and technical requirements from senior stakeholders and communicate those requirements to the proper techies – while also trying to ensure any motivating passions of one’s own don’t get lost along the way – Open Access to research and Open Education in my case.

For various reasons, individual user accounts have never been implemented for our repository and historically it has been administered centrally from the Library. In Symplectic we now have a system that is populated with central HR data; all staff will have an account they can access with their standard user name and password from where they can manage their own research profile including uploading full-text outputs directly to the repository*. In addition, administration by the University Research Office and faculty research administrators will be more easily centralised (particularly for the REF).

* In actual fact this functionality is not yet available in lieu of development work from Intrallect to capture the Atom feed from Symplectic and transform with XSLT to a suitable format for intraLibrary. I think.

One of the clever bits of functionality used to sell the software is automatic retrieval of bibliographic data from online citation databases – we are currently running against various APIs, Web of Science (lite), PubMed and arXiv – but I think this may actually be a bit of a red-herring for an institution like Leeds Metropolitan – at least until more (preferably free) data sources are available (JournalToCs API please!); early testing has shown, at best, it will only retrieve a subset of (the types of) outputs that we will need to record and it will be necessary to manually import existing records (e.g. EndNote) as well as implementing other administrative procedures at faculty level to capture information at the point of publication, especially for book-items, monographs, conference material, reports and grey literature.

More important, I think, to ensure that academic staff actually engage with the software rather than just seeing it as a tool for administrators, is to re-use the data to generate a list of research outputs – a dynamic bibliography – on a personal web-profile which has the potential to dramatically increase the visibility of research including Open Access to full-text.

Developing staff profiles of this type has been something of an obsession of mine for a while; we explored doing so from the repository (using SRU and email address as a Unique Identifier) and did develop a working prototype. Symplectic, however, integrated with central HR data and with its more sophisticated API, should make it much easier, at least from a technical perspective, and we are currently liaising with the central web-team to develop something similar to this example from Keele University – http://www.keele.ac.uk/chemistry/staff/mormerod/ (like us, Keele run Symplectic alongside intraLibrary.)

N.B. From the Symplectic interface, a user is able to “favourite” a research record and a flag comes out in the xml from the API which I understand is used on this page to display “Selected Publications”. DOI is also available from the API to link to the published version and if a user uploads full-text to the repository from Symplectic, this link is also in the xml – the first two records on this page include links to the full-text in Keele’s intraLibrary repository.

Our own Library web-dev Mike Taylor has been looking at the Symplectic API in detail and has put together a couple of prototype pages on a development server and after a meeting this week with a representative of the central web-team I’m reasonably confident we can move forward with this work fairly quickly…though there’s still a bit of a chicken & egg situation in populating the Symplectic database to then be re-surfaced via the API in this way.

There is also the question of whether we might alter our repository policy to become full-text only; one limitation of repositories across UK HE from an original conception (in the arXiv mould) of holding, disseminating and preserving full-text research outputs, is that they have in effect become “diluted” by metadata records for which it has not (yet) been possible to procure full-text or copyright does not permit deposit and “hybrid” repositories like ours, of full-text and metadata typically contain more metadata records than full-text (see figures from the RSP survey here). As I have argued on the UKCoRR blog, I think is makes sense to separate a bibliographic database (in Symplectic) from full-text only in a repository.

N.B. As Symplectic does not have the same search functionality as the repository, this approach has the potential disadvantage that it makes it more difficult to search across the entire corpus of research records (though one potential solution may be along the lines of that implemented by City Research Online which, in my view is rapidly becoming an exemplar of a research management system (Symplectic) + full-text repository (EPrints). Another good example is  St Andrews (PURE + DSpace) who presented a case study at “Learning How to Play Nicely: Repositories and CRIS” (video here.)

And what of OER? Along with our EasyDeposit SWORD interface, using OER to resource the refocus the undergraduate curriculum and the soon to be released intraLibrary 3.5 that will enable us to harvest OER from other repositories…for now I think they may be the bits down the back of the settee…

Advertisements

Four JISC repository infrastructure projects

I was contacted this week by Evidence Base at Birmingham City University who are conducting a “short lightweight review” of four key repository infrastructure projects, preliminary to a larger evaluation of the IE programme as a whole, and are talking to JISC programme managers and project managers as well as seeking views from lowly repository managers like me!

The four projects I was asked to discuss were:

Repository Search (UK Institutional Repository Search- IRS) – http://www.intute.ac.uk/irs/
Repository Support Project (RSP) – http://www.rsp.ac.uk
Repository Junction (Open Access – Repository Junction – OA -RJ) – http://edina.ac.uk/projects/oa-rj/ and
Repository Aggregation (RepUK) – http://www.ukoln.ac.uk/projects/repuk/

Now I like to think I’ve got my ear to the ground and I was immediately struck that I was only actually familiar with two of these projects (the intute IR search and, of course, the good old RSP).  So I followed the links for the other two projects to learn what I could – both of which, in my view, need to be very much more high profile than they are currently (though they do both have another 12 months to run until 31st March 2011.)  My ensuing discussion with the lady from Evidence Base was more around the conceptual value of all four projects.

OA-RJ

I expect that OA-RJ in particular will gain traction over the coming months, not least because it is referenced in the current JISC Grant Funding Call Deposit of research outputs and Exposing digital content for education and research.

The purpose of the project is to scope, build and test a deposit broker tool to assist open access deposit into, and interoperability between, existing repository services; currently multiple-authored journal articles are deposited singly in either an institutional, funder or subject-based repository and the primarily aim is to simplify the repository deposit workflow for multiple-authored journal articles; OA-RJ will therefore offer an API that supports redirect and deposit of research outputs into multiple repositories.

RepUK


I was particularly interested in RepUK and IRS as I have for some time been a little non-plussed by our collective, continued obsession with the woefully under-used OAI-PMH and both these projects are using the protocol (I think!).

There is not a huge amount of information on the RepUK website but the paragraph below gives a flavour of the project:

“The interest in exploiting the content to be found in institutional repositories is growing. At the same time, there is a range of possible uses for a central cache of metadata records held by institutional repositories. Most notably, with a recent emphasis on ‘rapid innovation’, there exists an opportunity to position this aggregation of data to support research and development generally in the fields of metadata and/or repositories. Rapid innovation projects which require a corpus of metadata to work with will benefit from this readily available data-store, avoiding the resource-intensive overhead of developing their own harvesting and aggregation solution.”

RepUK also invokes Lorcan Dempsey’s concept of ‘concentration’ in a Web 2.0 environment as a “major characteristic of our network experience” involving “major gravitational hubs” that “concentrate data, users (as providers and consumers), and communications and computational capacity” and posits that “a central cache of metadata records held by institutional repositories” in this way, exposed by a simple, RESTful API, would allow the community to start building value added services around this (hopefully) high quality metadata.

UK Institutional Repository Search (IRS)

This service has come to the end of its funded period as a JISC project but is being maintained at a basic level by Mimas. I presume that it is using OAI-PMH* to cross-search UK IRs and offers “conceptual search” and “text mining search”**. With the best will in the world, it is difficult to see how this facility can compete with the likes of Google in its current incarnation

* May be conceptual search used OAI-PMH but “text mining” is more Google style?
** At least it did but the text mining search was broken and was giving a “Bad Gateway” yesterday – it now appears to have been rerouted to the “conceptual search” only, presumably while it is fixed.


Google, of course, withdrew support for OAI-PMH back in April 2008 and though I’m aware of a few harvesters around like OAIster, even OpenDoar uses a Google custom search – http://www.opendoar.org/search.php, not OAI-PMH, to search repository content.

I can offer only anecdotal evidence but I’m pretty sure that your average academic will tend towards Google/Google Scholar to source research on the open web and has no idea about the OAI-PMH which simply isn’t widely used enough to justify our ongoing fixation.  The  reasons for this are severalfold and represent, to some extent, the protocol’s pedegree (that dates back to the earliest days of the open access and institutional repository movements) and the associated investment by the community, in software specification for example; also from a recognition of the limitations of Google for academic purposes and the undoubted potential of OAI-PMH (though this potential has arguably been watered down by so many repositories also carrying metadata only records rather than exclusively full text.)

RSP

When I was new to repositories I found the RSP absolutely invaluable as a source of information and support, they came to a soft launch of our repository back in 2008 which was really useful to give colleagues a little bit of a wider view of the repository landscape in the UK. I must confess that I haven’t been back to the RSP website for a while and I was pleasantly surprised that there is now a great deal more content covering everything from a primer on the OAI-PMH to advice and resources for successful advocacy.  I was also reminded that the RSP do outreach visits and I may well consider giving them a call – it would certainly be useful to get an objective perspective on some of the issues we continue to face with repository development here at Leeds Met.

I’m not naive, of course, to the reasons for JISC conducting these project evaluations and they clearly want to think carefully about where future investment can most effectively be made; I was asked a few leading questions around how the RSP still meets the needs of the community (I think they do!) and how they might adapt their approach to meet shifting requirements – the start-ups are all but finished I think but, no doubt, new people are coming into the sector all the time who will most certainly benefit from the clear information and support of the RSP.  I also speculated somewhat idly whether the website could be a bit more dynamic and, well, Web 2.0 – they do have a presence on Twitter – @RepoSupport – but I couldn’t find it from the website and I don’t think it feeds there.  I even wondered whether a social network style site using ning or elgg might work….just a thought.