What role for repositories after (Gold)Finch?

As the discussions around the rights and wrongs of the Finch report rumble on (like a storm that you think has moved away before frightening the life out of you with a huge thunderclap right over your head), the new RCUK policy will take effect on April Fools Day next year (not sure if the date is significant?) whereby RCUK-funded authors must publish in RCUK-compliant journals i.e. journals that offer a suitable gold option OR a suitable green option. By “a suitable gold option” RCUK means immediate (unembargoed) OA to the “version of record” from its own web site, under a CC-BY license AND permission for immediate deposit of the version of record in an OA repository, also under a CC-BY license.

If the recommendations of the Finch report are realised and full gold OA is achieved in the UK, will the main function of repositories then be to preserve the institutional “version of record” and should we endeavour to procure that version rather than, for example, giving up and going home…or, perhaps, just linking to the gold version elsewhere?

As discussed in a recent post for ukcorr I would argue that, whatever happens, repositories are likely to remain a primary source of authoritative full-text versions of research outputs, not to mention associated data-sets as well as a variety of other scholarly outputs, including electronic theses and Open Educational Resources (OER) (N.B. dropping this link in to the excellent briefing paper on Open Practices from the OER Synthesis and Evaluation Project for convenient personal reference.)

In addition, repository infrastructure is predicated on the principles of interoperability, and though the potential to aggregate repository content across the national and global network has arguably not been fulfilled, it continues to be an active area with the development of services like BASE in Germany, RIAN in the Republic of Ireland, JAIRO in Japan and CORE in the UK.

If we are able to work within the prescriptions of Finch and the RCUK policy to increase the quality assured content of our repositories as well as integrating with institutional systems and making them ever more flexible tools for our research communities then together with the prospect of COUNTER compliant download stats from repositories (see that ukcorr post) we can continue to play a pivotal role, not just in the evolution of Open Access to research but the active dissemination of research to the public and increase the profiles and reputations of our institutions to boot!

Jorum / OER case-study

The process of setting up an OER service at an institution is potentially complex, requiring a significant infrastructural and human resource both to implement and maintain hardware / software and to promote the service to institutional stakeholders. A successful service is likely to require at least 1 FTE post-holder though the full range of expertise is unlikely to reside with an individual staff-member and will involve a considerable learning curve in areas as diverse as copyright and IPR, cataloguing and metadata standards, repository/VLE/content-management and more general web-technologies.

At Leeds Metropolitan University, the OER repository has to a large extent developed alongside Jorum and represents several projects including a JISC funded repository start-up, a UKOER phase 1 project (Unicycle) and the HEA funded ACErep project. The maturing service is built on intraLibrary, a commercial learning-object repository that incurs an annual licensing fee in addition to a substantial implementation cost in year one and has required considerable customisation and associated technical work to embed it within a broader OER infrastructure. The core human resource is currently 1 FTE Repository Developer (University grade 5), 0.3 FTE Information Services Librarian and 0.5 FTE Senior web-developer (University grade 6) though staff employed on the project(s) has varied over time, including faculty based administrators during Unicycle for example.

Even though Leeds Metropolitan University has implemented its own repository, Jorum continues to be an important component of the institutional OER infrastructure and has the potential to increase the visibility of local repository content, by harvesting metadata into the national service for example. Moreover, as the institution moves towards a “consumer” model of OER use as part of the resourcing of its curriculum, staff will be increasingly directed to Jorum as a national OER repository that links together a number of institutions & resources where the aggregated material is largely from the UK HE sector, meaning there is an agreed understanding of levels of study so staff can feel confident about using the resource in their Learning & Teaching.

In terms of what the sector would lose, one of the main things is simply the focus of a national aggregation service…it has always been a frustration that OA research has never been successfully aggregated, for two main reasons, I think:

  • because they are “diluted” by metadata records for which it has not been possible to procure full-text or copyright does not permit deposit
  • insufficient (auto-harvestable) rights information

(See these posts on the UKCORR blog for more on this http://ukcorr.blogspot.co.uk/2012/03/unfulfilled-promise-of-aggregating.html and http://ukcorr.blogspot.co.uk/2012/03/are-your-repository-policies-worth-html.html)

UKOER and Jorum, I think, have in fact circumvented both these issues and aggregation is therefore much more effective (especially by RSS as a light weight alternative to OAI-PMH) and with the potential to do all sorts of interesting stuff with that aggregation.

Leeds Metropolitan University has also worked with Jorum and intraLibrary on the PORSCHE (Pathways to Open Resource Sharing through Convergence in Healthcare Education) project at HEA MEDEV which has explored potential ways to represent OER in multiple repositories and which has resulted in the intraLibrary software incorporating OAI-PMH harvest functionality so that the metadata from external repositories can appear as a “collection” in intraLibrary. As well as being valuable to the institutions involved, this work has benefitted the sector as a whole and demonstrates the value of a central OER repository that can serve as a focus for innovation across the sector.

Still baffled by Google…

Just reproducing an email to ukcorr-discuss here in case any technically minded folk not on the list might pass by these parts…

To revisit the whole Google Scholar / full-text indexing “thing” I was just looking at results in GS for a particular academic who has raised a query about his full-text not being visible in Google Scholar; he has 6 full-text in the repository but a site: search of GS only appears to return x2:


Initially I thought it may be an artefact of when full-text were added; records were all added at the same time (24th May 2011) but full-text was only added for one of the GS results at that time (plus one not indexed at all – see below) as opposed to October 2011 for all the others (including the other GS result)…and that’s still a good 6 months which you would think would be long enough to be indexed. Wouldn’t you?

Normal Google, by contrast, returns 4 full-text records:


The missing results are http://repository.leedsmet.ac.uk/main/view_record.php?identifier=4881&SearchGroup=Research (full-text added 24th May 2011) / http://repository.leedsmet.ac.uk/main/view_record.php?identifier=4893&SearchGroup=Research (full-text added 10th October 2011).

The only other difference I can spot is that several of those non-indexed in GS don’t have metadata in the PDF (which is why they have just been picked up in normal Google as “Leeds Metropolitan University Repository” from the coversheet…

As a caveat, there is technical peculiarity in that we effectively have a two-server set up with our Open Search interface on an institutional server which queries intraLibrary by SRU, the software itself hosted for us in a server-farm somewhere which might explain idiosyncratic behaviour to some extent…

An institutional tangram – musings on developing an integrated research management system

“The tangram (Chinese: 七巧板; pinyin: qī qiǎo bǎn; literally “seven boards of skill”) is a dissection puzzle consisting of seven flat shapes, called tans, which are put together to form shapes. The objective of the puzzle is to form a specific shape (given only an outline or silhouette) using all seven pieces, which may not overlap.”


Having implemented an institutional repository at Leeds Metropolitan and learning by experience some of the difficulties associated with advocacy around the use of that repository (both for OA research and OER) I have become all too aware “that repositories are ‘lonely and isolated’; still very much under-used and not sufficiently linked to other university systems”. So said JISC’s Andy McGregor at an event called “Learning How to Play Nicely: Repositories and CRIS” in May 2010 at Leeds Metropolitan (see my report for Ariadne here). This quote is still relevant, though  perhaps a little less so than when I heard it nearly 2 years ago, thanks to the ongoing work of JISC and particularly the RSP. In any case, the event was a revelation for me and I have coveted a so called Current Research Information Management systems (or CRIS for short) ever since!

And now, in Symplectic Elements, I have one…or at least the components of one (click on image for full size.)

The finished tangram? (click on image for full size)

It’s a puzzle though. A tangram if you will…one with considerably more than seven pieces:

intraLibrary, Symplectic, institutional website, University Research Office (URO), faculty research administrators, The Research Excellence Framework (REF), academic staff, web-developers, bibliographic information, research outputs, Open Educational Resources (OER)…

In fact, this may well not be all the pieces…pretty sure a few have been pushed down the back of the settee. I’ll look for them later.

Anyway, tortured metaphors aside, I have become increasingly aware that working in a large institution, in a role that encompasses technology and institutional policy (though I’m not, by any means, a policy maker…or indeed a real techie) is largely about communication and getting the right people, with the right skills, in the right place at the right time! Absorb policy and technical requirements from senior stakeholders and communicate those requirements to the proper techies – while also trying to ensure any motivating passions of one’s own don’t get lost along the way – Open Access to research and Open Education in my case.

For various reasons, individual user accounts have never been implemented for our repository and historically it has been administered centrally from the Library. In Symplectic we now have a system that is populated with central HR data; all staff will have an account they can access with their standard user name and password from where they can manage their own research profile including uploading full-text outputs directly to the repository*. In addition, administration by the University Research Office and faculty research administrators will be more easily centralised (particularly for the REF).

* In actual fact this functionality is not yet available in lieu of development work from Intrallect to capture the Atom feed from Symplectic and transform with XSLT to a suitable format for intraLibrary. I think.

One of the clever bits of functionality used to sell the software is automatic retrieval of bibliographic data from online citation databases – we are currently running against various APIs, Web of Science (lite), PubMed and arXiv – but I think this may actually be a bit of a red-herring for an institution like Leeds Metropolitan – at least until more (preferably free) data sources are available (JournalToCs API please!); early testing has shown, at best, it will only retrieve a subset of (the types of) outputs that we will need to record and it will be necessary to manually import existing records (e.g. EndNote) as well as implementing other administrative procedures at faculty level to capture information at the point of publication, especially for book-items, monographs, conference material, reports and grey literature.

More important, I think, to ensure that academic staff actually engage with the software rather than just seeing it as a tool for administrators, is to re-use the data to generate a list of research outputs – a dynamic bibliography – on a personal web-profile which has the potential to dramatically increase the visibility of research including Open Access to full-text.

Developing staff profiles of this type has been something of an obsession of mine for a while; we explored doing so from the repository (using SRU and email address as a Unique Identifier) and did develop a working prototype. Symplectic, however, integrated with central HR data and with its more sophisticated API, should make it much easier, at least from a technical perspective, and we are currently liaising with the central web-team to develop something similar to this example from Keele University – http://www.keele.ac.uk/chemistry/staff/mormerod/ (like us, Keele run Symplectic alongside intraLibrary.)

N.B. From the Symplectic interface, a user is able to “favourite” a research record and a flag comes out in the xml from the API which I understand is used on this page to display “Selected Publications”. DOI is also available from the API to link to the published version and if a user uploads full-text to the repository from Symplectic, this link is also in the xml – the first two records on this page include links to the full-text in Keele’s intraLibrary repository.

Our own Library web-dev Mike Taylor has been looking at the Symplectic API in detail and has put together a couple of prototype pages on a development server and after a meeting this week with a representative of the central web-team I’m reasonably confident we can move forward with this work fairly quickly…though there’s still a bit of a chicken & egg situation in populating the Symplectic database to then be re-surfaced via the API in this way.

There is also the question of whether we might alter our repository policy to become full-text only; one limitation of repositories across UK HE from an original conception (in the arXiv mould) of holding, disseminating and preserving full-text research outputs, is that they have in effect become “diluted” by metadata records for which it has not (yet) been possible to procure full-text or copyright does not permit deposit and “hybrid” repositories like ours, of full-text and metadata typically contain more metadata records than full-text (see figures from the RSP survey here). As I have argued on the UKCoRR blog, I think is makes sense to separate a bibliographic database (in Symplectic) from full-text only in a repository.

N.B. As Symplectic does not have the same search functionality as the repository, this approach has the potential disadvantage that it makes it more difficult to search across the entire corpus of research records (though one potential solution may be along the lines of that implemented by City Research Online which, in my view is rapidly becoming an exemplar of a research management system (Symplectic) + full-text repository (EPrints). Another good example is  St Andrews (PURE + DSpace) who presented a case study at “Learning How to Play Nicely: Repositories and CRIS” (video here.)

And what of OER? Along with our EasyDeposit SWORD interface, using OER to resource the refocus the undergraduate curriculum and the soon to be released intraLibrary 3.5 that will enable us to harvest OER from other repositories…for now I think they may be the bits down the back of the settee…

UKCoRR meeting

I wasn’t able to attend the UKCoRR meeting held in Kingston on Friday, as much as I would have liked to.  It sounds like I missed out on a really good day with an excellent programme.

A thorough summary and all the presentations from the day are available from the UKCoRR website:


In addition, there is a summary on the UKCoRR blog:


I was particularly interested in Theo Andrews’ presentation on Central Funds for Open Access and ensuing discussion around institutionally designated funds for OA – both Gold and Green routes.  I hope UKCoRR don’t mind me reproducing some of the issues discussed here:

1) Concern about the costs: these might escalate, and sometimes amount to “double dipping” (some publishers are paid by authors and subscribers because they charge authors for OA article publication but don’t reduce their subscription fees).
2) Publishers who are aware of funder mandates for OA within 6 months, might introduce 12 month embargoes on post-print availability in OA repositories, in order to force authors to pay for OA publishing of the final version or miss their funder’s mandate. (NB the point here is that funders are paying, as authors can claim such costs from funders. But we’re all struggling to set up mechanisms by which this can be done – see Theo’s presentation for a summary of the issues.)
3) An institutional response might be to set up an OA fund, or it might be to encourage authors to deposit post-prints into the OA repository, rather than paying such publishers’ fees. Some researchers object to the fees being charged.
4) The Wellcome Trust does seem to prefer that the authors pay for OA publication, and indeed it suits authors better than depositing themselves because a part of the Wellcome mandate is for PubMed deposit. By paying, authors can leave the PubMed deposit up to the publishers to do. Is the Wellcome Trust’s mandate skewing the OA landscape in the way publishers have responded to them, whilst other academic disciplines are no way near as well funded?

The inimitable @llordllama has also posted summaries of the day on the UoL Library blog:



On the strength of this I’m certainly looking forward to attending future UKCoRR events – maybe even oop North next time?!