An institutional tangram – musings on developing an integrated research management system

“The tangram (Chinese: 七巧板; pinyin: qī qiǎo bǎn; literally “seven boards of skill”) is a dissection puzzle consisting of seven flat shapes, called tans, which are put together to form shapes. The objective of the puzzle is to form a specific shape (given only an outline or silhouette) using all seven pieces, which may not overlap.”

http://en.wikipedia.org/wiki/Tangram

Having implemented an institutional repository at Leeds Metropolitan and learning by experience some of the difficulties associated with advocacy around the use of that repository (both for OA research and OER) I have become all too aware “that repositories are ‘lonely and isolated’; still very much under-used and not sufficiently linked to other university systems”. So said JISC’s Andy McGregor at an event called “Learning How to Play Nicely: Repositories and CRIS” in May 2010 at Leeds Metropolitan (see my report for Ariadne here). This quote is still relevant, though  perhaps a little less so than when I heard it nearly 2 years ago, thanks to the ongoing work of JISC and particularly the RSP. In any case, the event was a revelation for me and I have coveted a so called Current Research Information Management systems (or CRIS for short) ever since!

And now, in Symplectic Elements, I have one…or at least the components of one (click on image for full size.)

The finished tangram? (click on image for full size)

It’s a puzzle though. A tangram if you will…one with considerably more than seven pieces:

intraLibrary, Symplectic, institutional website, University Research Office (URO), faculty research administrators, The Research Excellence Framework (REF), academic staff, web-developers, bibliographic information, research outputs, Open Educational Resources (OER)…

In fact, this may well not be all the pieces…pretty sure a few have been pushed down the back of the settee. I’ll look for them later.

Anyway, tortured metaphors aside, I have become increasingly aware that working in a large institution, in a role that encompasses technology and institutional policy (though I’m not, by any means, a policy maker…or indeed a real techie) is largely about communication and getting the right people, with the right skills, in the right place at the right time! Absorb policy and technical requirements from senior stakeholders and communicate those requirements to the proper techies – while also trying to ensure any motivating passions of one’s own don’t get lost along the way – Open Access to research and Open Education in my case.

For various reasons, individual user accounts have never been implemented for our repository and historically it has been administered centrally from the Library. In Symplectic we now have a system that is populated with central HR data; all staff will have an account they can access with their standard user name and password from where they can manage their own research profile including uploading full-text outputs directly to the repository*. In addition, administration by the University Research Office and faculty research administrators will be more easily centralised (particularly for the REF).

* In actual fact this functionality is not yet available in lieu of development work from Intrallect to capture the Atom feed from Symplectic and transform with XSLT to a suitable format for intraLibrary. I think.

One of the clever bits of functionality used to sell the software is automatic retrieval of bibliographic data from online citation databases – we are currently running against various APIs, Web of Science (lite), PubMed and arXiv – but I think this may actually be a bit of a red-herring for an institution like Leeds Metropolitan – at least until more (preferably free) data sources are available (JournalToCs API please!); early testing has shown, at best, it will only retrieve a subset of (the types of) outputs that we will need to record and it will be necessary to manually import existing records (e.g. EndNote) as well as implementing other administrative procedures at faculty level to capture information at the point of publication, especially for book-items, monographs, conference material, reports and grey literature.

More important, I think, to ensure that academic staff actually engage with the software rather than just seeing it as a tool for administrators, is to re-use the data to generate a list of research outputs – a dynamic bibliography – on a personal web-profile which has the potential to dramatically increase the visibility of research including Open Access to full-text.

Developing staff profiles of this type has been something of an obsession of mine for a while; we explored doing so from the repository (using SRU and email address as a Unique Identifier) and did develop a working prototype. Symplectic, however, integrated with central HR data and with its more sophisticated API, should make it much easier, at least from a technical perspective, and we are currently liaising with the central web-team to develop something similar to this example from Keele University – http://www.keele.ac.uk/chemistry/staff/mormerod/ (like us, Keele run Symplectic alongside intraLibrary.)

N.B. From the Symplectic interface, a user is able to “favourite” a research record and a flag comes out in the xml from the API which I understand is used on this page to display “Selected Publications”. DOI is also available from the API to link to the published version and if a user uploads full-text to the repository from Symplectic, this link is also in the xml – the first two records on this page include links to the full-text in Keele’s intraLibrary repository.

Our own Library web-dev Mike Taylor has been looking at the Symplectic API in detail and has put together a couple of prototype pages on a development server and after a meeting this week with a representative of the central web-team I’m reasonably confident we can move forward with this work fairly quickly…though there’s still a bit of a chicken & egg situation in populating the Symplectic database to then be re-surfaced via the API in this way.

There is also the question of whether we might alter our repository policy to become full-text only; one limitation of repositories across UK HE from an original conception (in the arXiv mould) of holding, disseminating and preserving full-text research outputs, is that they have in effect become “diluted” by metadata records for which it has not (yet) been possible to procure full-text or copyright does not permit deposit and “hybrid” repositories like ours, of full-text and metadata typically contain more metadata records than full-text (see figures from the RSP survey here). As I have argued on the UKCoRR blog, I think is makes sense to separate a bibliographic database (in Symplectic) from full-text only in a repository.

N.B. As Symplectic does not have the same search functionality as the repository, this approach has the potential disadvantage that it makes it more difficult to search across the entire corpus of research records (though one potential solution may be along the lines of that implemented by City Research Online which, in my view is rapidly becoming an exemplar of a research management system (Symplectic) + full-text repository (EPrints). Another good example is  St Andrews (PURE + DSpace) who presented a case study at “Learning How to Play Nicely: Repositories and CRIS” (video here.)

And what of OER? Along with our EasyDeposit SWORD interface, using OER to resource the refocus the undergraduate curriculum and the soon to be released intraLibrary 3.5 that will enable us to harvest OER from other repositories…for now I think they may be the bits down the back of the settee…

Infrastructure schematic (1st draft)

There are several significant developments that will impact on our repository / research management / OER dissemination and discovery over the next 12 months or so…briefly these are:

This is a quick schematic of how the developing infrastructure might look (a bit big to fit in my WordPress theme so click on image for full size):

Repository reports and more on SEO

I’ve been trying to get to grips with what usage data I can generate from our repository – both for research but particularly OER for a small JISC funded follow up to Unicycle.  I don’t really have anything equivalent to IRStats for EPrints – see this report from USIR for the type of data that can be generated from Salford’s EPrints repository – but I do have Google Analytics running on http://repository.leedsmet.ac.uk/ and intraLibrary’s own reporting tool.

The issue is complicated for us slightly in that we effectively have two repository sites running on two different servers!  There is intraLibrary itself hosted for us by Intrallect and there in the Open Search SRU interface on a Leeds Met server.  From Analytics I can get data on traffic to Open Search including hits on the metadata page for individual records but I cannot identify whether the full text/resource was actually downloaded. However, I CAN get this info from intraLibrary itself.

The dual server set-up also creates issues for SEO and I’ve been trying to ensure that full text, where available, is indexed by Google.  Though we have made some progress, I’m still not sure the issue has been fully resolved…intraLibrary generates a Public URL for each record – if this is not stored in the metadata (as was the case for us) then it is re-generated each time the record  is accessed – interpreted as a dynamic URL by Googlebot and not indexed.  I was able to work with Intrallect to ensure that a Public URL is generated when a record is created and stored in the metadata; Mike embeds this now-consistent URL in the results from Open Search which (hopefully) will now be indexed by Google.

There are currently a total of 250 PDFs in intraLibrary (188 research and 62 OER) and certainly *some* of these are being indexed; searching Google for filetype:pdf site:http://repository-intralibrary.leedsmet.ac.uk/ returns 53 records (up from 52 earlier in the week so will keep an eye on this) whereas Filetype:pdf site:http://repository.leedsmet.ac.uk/ does not return any PDFs because the they are not at that address so I don’t think we’ll be able to generate the nice nested – landing page/full text – search results that you see from EPrints repositories, at least while intraLibrary and Open Search are on seperate servers.

It is interesting to consider the implications of some of this on usage reporting, especially in the context of OER which are disseminated more widely than research (via Jorum, Xpert and potentially also the institutional VLE.)

According to Google Analytics, the most viewed OER on Open Search in September was Employability & Career Development: Assessing your Skills, Talents and Attributes which was viewed a total of 26 times – 13 absolute unique visitors – it does not feature in the report from intraLibrary, however, as it’s an external URL and does not utilise the intraLibrary Public URL (need to rectify this – there is a Public URL available that would redirect enabling us to record follow through).

It gets really interesting when you look at the most accessed item according to the intraLibrary report – Numeracy Basics – interactive quiz came in third from GA with the not terribly impressive stat of 19 hits (6 absolute unique) but the Public URL was apparently access a whopping 588 times!  I’m not sure yet where all these hits have come from (think I may be able to get IP info from intraLibrary) but may be someone has linked to it from the VLE – it is also in Xpert and I posted a link to it at http://repositorynews.wordpress.com/2010/10/01/xpert-vs-jorum/ but that was 1st October – this particular resource isn’t yet in Jorum (http://open.jorum.ac.uk/xmlui/handle/123456789/5817 – viewed 290 times on JO – is a hosted version so definitely not linking to the intraLibrary Public URL.)

Also pertinent here, I think, is a twitter discussion I had recently with @glittrgirl (Suzanne Hardy of PORSCHE) and others about managing duplicate OER records and it occurs to me that we are not, in fact, duplicating records at all – Jorum harvests full IMSCP so the record will point at our intraLibrary install (the example above notwithstanding that *is* actually duplicated in JO!) and Xpert harvests our OAI-PMH which, again, will point to the same link…(might be more of a duplication issue with ACErep though…need to think that through.)

What would you like to search for – research or OER?

Leeds Met Open Search – http://repository.leedsmet.ac.uk/main/index.php – now incorporates a “splash screen” that allows the user to choose which collection they wish to search with links that provide access to separate interfaces that are tailored to each type of material:

Leeds met Open Search splash page

Each tabbed interface provides an appropriate Advanced search form as well as relevant browse options; by LCSH or faculty for research and by HEA Subject Centres or JACS code for OERs:

Once again, massive thanks to Mike for his rapid response to the the myriad requests I make of him on a daily basis!

A quick look at JorumOpen

As anyone with even a passing intererest in UKOER will know, JorumOpen went live earlier this week and I, for one, was dying to see just what the good folk at Mimas and Edina have come up with with their customised DSpace installation (and possibly “borrow” one or two ideas for Leeds Met Open Search!).

JorumOpen Home is at http://open.jorum.ac.uk/xmlui/ and allows the user to browse OER by FE or HE subject; alternatively there are links to browse by Communities & collections/Issue date/Authors/Titles and Keyword.  There is also a simple search box and a link to an Advanced search form:

The results page comprises different functionality depending on the search – for example, browsing by subject heading displays “Recent Deposits” and allows the user a simple/advanced search, or browse by Titles/Authors/Dates within that subject heading (I like this hierarchical search functionality); also includes an RSS button to subscribe to updates within the collection.

Results themselves comprise a hyperlinked title, author/author affiliation and date of deposit as well as a thumbnail graphic where available:

The record page is worth looking at in detail (this item – http://open.jorum.ac.uk/xmlui/handle/123456789/567):

Show full item record (link) - Full Dublin Core metadata record

Share (AddThis button) – third part social network service allowing record to be emailed to a friend or posted to various social networking sites.

The simple record comprises:

Title/Author/Description/Keywords/Persistent Link/Date

Then there are three buttons:

“Export resource” that requires a valid email address “As some resources are quite large in size it can take some time to prepare them for download. Due to this we required you to supply a valid email address so that you can be notified when your download is ready.”  Then follows an email from support@jorum.ac.uk that informs that “The item export you requested from the repository is now ready for download.” and includes a link to download the compressed file which comprises all files associated with the resource.*

“Preview content package” which allows the user to quickly view the different files and components of the resource in their browser without downloading (though it doesn’t work for .zip files)

“Download original content package” does exactly what it says on the tin and downloads a compressed file of all files associated with the resource.*

* I’m not entirely sure what the difference is between “export” and “download” – though the exported zip is bigger and contains more files (dublin_core.xml as well as imsmanifest.xml for example) – may be someone can enlighted me?

CC Licence Note – briefly explains implications of CC and links to relevant anchor later in the record.

Files in this item - allows the user to expand a list of files and download them individually (this particular item comprises 16 .zip and 2 .docx)

Creative Commons Licence – Link to relevant CC licence (opens in a nifty little window.)

Terms of service – Link to Jorum terms of service (also opens in a nifty little window)

This item appears in the following collections – linked to appropriate search terms in browse tree

Show full item record (repeated link from top of page) - Full Dublin Core metadata record

This item has been viewed x times – presumably counts visits to the record page

All in all, first impressions are pretty favourable and there are certainly some ideas that I would like to explore for Leeds Met Open Search – I’ve already included the AddThis button on the development server and plan to go live with it as soon as it has been approved by the powers that be (there are one or two issues with user tracking by this third party service – Mike has disabled Flash tracking that the widget injects into the page by default but it will still track each click-through.)

I’m also keen to explore how we may manage packaged content in a similar way to JorumOpen (preview content and download options for individual files) – currently we have very little packaged content in the repository but the default download link is currently just for an individual file – I do know that intraLibrary is able to manage content packages, however, and that a package download link is exposed by SRU so I think we should be able to achieve this.

Browse by date (of deposit) should also be achievable I think but browse by author is a little more problematic by SRU (both for research and OER) as there is no authority file for authors.

I’m not sure about recording page visits – will need to speak to Mike.

Now I just need to figure out the most efficient way of getting our UniCycle resources into JorumOpen – I will look at the deposit process in a later post (depositors can log in from JorumOpen Home via UK Federation) and I think Jorum are still exploring harvesting RSS feeds from ukoer projects though, as discussed in a recent post, our feed is not currrently suitable for this.

Return all records via Open Search

This primarily for my reference – in order to return all records from a given collection enter cql.allRecords=1 into the standard search at http://repository.leedsmet.ac.uk/main/index.php and select which token you wish to use (Research/Open Educational Resources)

Follow

Get every new post delivered to your Inbox.