Repository News

Implementing an Institutional Repository for Leeds Metropolitan University

Archive for the ‘Open Access’ Category

UKCoRR meeting

Posted by Nick on August 18, 2009

I wasn’t able to attend the UKCoRR meeting held in Kingston on Friday, as much as I would have liked to.  It sounds like I missed out on a really good day with an excellent programme.

A thorough summary and all the presentations from the day are available from the UKCoRR website:

http://www.ukcorr.org/events/aug2009-event.php

In addition, there is a summary on the UKCoRR blog:

http://ukcorr.blogspot.com/2009/08/after-our-meeting.html

I was particularly interested in Theo Andrews’ presentation on Central Funds for Open Access and ensuing discussion around institutionally designated funds for OA – both Gold and Green routes.  I hope UKCoRR don’t mind me reproducing some of the issues discussed here:

1) Concern about the costs: these might escalate, and sometimes amount to “double dipping” (some publishers are paid by authors and subscribers because they charge authors for OA article publication but don’t reduce their subscription fees).
2) Publishers who are aware of funder mandates for OA within 6 months, might introduce 12 month embargoes on post-print availability in OA repositories, in order to force authors to pay for OA publishing of the final version or miss their funder’s mandate. (NB the point here is that funders are paying, as authors can claim such costs from funders. But we’re all struggling to set up mechanisms by which this can be done – see Theo’s presentation for a summary of the issues.)
3) An institutional response might be to set up an OA fund, or it might be to encourage authors to deposit post-prints into the OA repository, rather than paying such publishers’ fees. Some researchers object to the fees being charged.
4) The Wellcome Trust does seem to prefer that the authors pay for OA publication, and indeed it suits authors better than depositing themselves because a part of the Wellcome mandate is for PubMed deposit. By paying, authors can leave the PubMed deposit up to the publishers to do. Is the Wellcome Trust’s mandate skewing the OA landscape in the way publishers have responded to them, whilst other academic disciplines are no way near as well funded?

The inimitable @llordllama has also posted summaries of the day on the UoL Library blog:

http://uollibraryblog.wordpress.com/2009/08/18/ukcorr-summer-2009-meeting-pt-1/

http://uollibraryblog.wordpress.com/2009/08/18/ukcorr-summer-2009-meeting-pt-2/

On the strength of this I’m certainly looking forward to attending future UKCoRR events – maybe even oop North next time?!

Posted in Event, Link, Open Access | Tagged: | 1 Comment »

JorumOpen will use DSpace

Posted by Nick on July 3, 2009

I think it fair to say that intraLibrary being the platform behind Jorum was a factor in our institutional decision to use the platform at Leeds Met so as the #ukoer projects get underway including Unicycle of course, it is with considerable interest that I discover that Jorum plan to implement a customised DSpace repository apparently to run alongside intraLibrary.

The news came to my attention in an email on the OER-INST mailing list which said that the move was to ensure that Jorum scales up for global access. This I promptly tweeted (me being me) and received a couple of coy allusions from interested parties before a tweet from @JorumTeam informed me that “All OER content for #jorum will be served from DSpace. Content licensed under JEducationUK or JPlus will be served from Intralib.y“.  Aside from this tweet, I’m not sure if there’s been anything more official from Jorum yet and apologies if my immediate web 2.0 dissemination of information in a closed mailing list was in any way inappropriate.

As discussed in previous posts (eg.  This one), I am aware of one or two issues with facilitating Open Access via intraLibrary, though I am confident that we do indeed have suitable technology within the software to facilitate OA, in the form of RSS, SRU and OAI-PMH for example.  It may be there are other issues around scalability that I am unaware of and I’d be very interested to learn why and precisely how Jorum have decided to also utilise DSpace.

No doubt we’ll learn more in due course…

Posted in Open Access, Open Educational Resources, UniCycle project | Leave a Comment »

Research in the Open: How Mandates Work in Practice

Posted by Nick on June 3, 2009

Bill Hubbard’s slides from last weeks event (which I didn’t go to) may come in useful.

(Thanks to UK Council of Research Repositories blog)

Posted in Link, Open Access | Tagged: , | Leave a Comment »

Development of Research Repository Aspect of IntraLibrary

Posted by Nick on June 1, 2009

On Friday Mike and I visited colleagues at Keele University for a meeting with Charles Duncan from Intrallect to consider development priorities for intraLibrary to better serve our needs as a research repository.  Over 4 and a half hours we considered the basic issues that need addressing as well as looking forward to some more ambitious functionality and integration with the wider research infrastructure as we move towards the REF.

I was particularly interested to learn about how Keele are implementing Symplectic’s publications management system – http://www.symplectic.co.uk/ – which regularly trawls Web of Science and PubMed central for information about Keele’s academic publications.  Symplectic have clearly been thinking about integration with IRs and there’s even a link to SHERPA/RoMEO.  The system was used at Imperial College London for the RAE 2008 process and includes link functionality with DSpace which is that institution’s IR platform – http://spiral.imperial.ac.uk/.  Intrallect are currently liaising with Symplectic about integration with intralibrary – I’m not certain precisely what form this would take but in an ideal world it would be great if we could auto populate as much metadata as possible (title/bibliographic info/abstract/author/copyright status according to RoMEO) and automatically nudge academics for full text where appropriate!

At Leeds Met we currently lack any form of research database which is why I’ve been exploring what are essentially manual workflows to populate the repository with all research output – I’m not sure how expensive Symplectic is and it may be difficult to justify given this institution’s relatively small research output and the repository may well have to be the research database which is the assumption I’ve been working on; we will also want to explore the soon-to-be-released Web of Science API which may, in any case, enable us to emulate some of this functionality ourselves.

The first item on our agenda was somewhat more prosaic and focussed on our immediate functional requirements – SRU searching and metadata.  Mike has been working on incorporating advanced search into the SRU interface and come up against a couple of issues when searching by author and date which are essentially artifacts of having to query DC rather than LOM; in the LOM, creators and contributors are clearly differentiated, however, querying by DC conflates creator and author roles which may (will) be different if resources are uploaded by someone other than the author.

  • Searching dc.creator will search for the creator and author roles
  • Searching dc.contributor will search for the content provider role

In addition:

  • Searching by dc.date only searches data that relates to the intraLibrary submission process (i.e. the deposit date, and perhaps modification dates if you added an author later on for example)
  • The only way to search journal dates is to use the default free text search that searches everything (or most fields anyway).

The solution, of course, is to make it possible to query the LOM by SRU and this is now Intrallect’s intention – indeed, to render all LOM fields query-able which would include user generated tags for example.

The next big question is exposure of open content to search engines and Charles gave us an overview of plans to develop an object “home page” with a static URL which should help in this area.  We also discussed sitemaps and what need to be done external to intraLibrary.  I’m still unclear on how we can improve the format of results returned by Google from the SRU interface; to repeat, Google IS indexing http://repository.leedsmet.ac.uk/ with site: http://repository.leedsmet.ac.uk/ currently returning over 500 records.  However this is fairly unstructured; Google is simply following links from http://repository.leedsmet.ac.uk/main/browse.php; any subsequent links Googlebot encounters are also indexed and returned as “The Repository search for [link name]” and ideally I’d like results to be returned in a more structured and user friendly form.   Many queries actually return no results where there is (yet) no content to find though where there is content, Google is indexing all human readable metadata.  I’m also not certain whether Googlebot is finding its way into the full text via the Open URL/virtual file paths generated by intraLibrary.  Full text indexing within intraLibrary itself has also been promised.

In short, I’m really not sure how all of these factors may combine to be exploited by a next generation SRU interface!

We then touched upon self-archiving and (semi) mediated workflows; potentially developing SWORD based quick deposit from desktop/web, ideally with automatic metadata generation.

The two other major issues we considered are:

  • Policy metadata – handling embargoes

This is pretty crucial to an OA archive of research as many publishers of academic journals specify an embargo period of 12 or 18 months from the date of publication before a paper can be made available in a repository.  We need to be able to add a paper to intraLibrary upon receipt but restrict access until the embargo has expired and for this to happen automatically.  On one level, this functionality should be fairly straightforward to achieve by having intraLibrary check today’s date against an embargo date specified in the metadata; it’s a little more complicated than that though as we would want the metadata to be visible before the embargo date, just not the full text.

  • Cover pages for PDF

It was suggested that a coversheet should be generated by intraLibrary on the fly which would certainly be useful as manually creating cover sheets for each and every article is time consuming to say the least; this would be useful functionality for CLA materials which also require a coversheet.

These developments will take some time to implement and the next stage is to prioritise – by anonymous e-postal ballot – Intrallect hope we will start to see some of the major initiatives in a build towards the end of the year.

Thank you to our colleagues at Keele for making us welcome and for feeding us!

Posted in Adapting intraLibrary, Open Access | Tagged: , , , , , , , , , | 3 Comments »

Final project report published

Posted by Nick on May 21, 2009

The final project report for our JISC start-up project, Implementing an Institutional Repository for Leeds Metropolitan University has now been published.  Available to download as PDF or Word format.

PDF version

Word version

Implementing an Institutional Repository for Leeds Metropolitan University was funded by the JISC Repositories and Preservation Programme – Repositories Start-up and Enhancement (Strand D).

We would like to thank the programme manager Andy McGregor for his support and guidance throughout the project.

During the project, input from several user groups and supporting staff was of great value and these include:

Academic staff at Leeds Metropolitan University

The TEL team at Leeds Metropolitan University

The Streamline project team

JISC Emerge community

The project team would also like to acknowledge the support and enthusiasm of our software provider, Intrallect as well as the Repositories Support Project and Web2Rights for their expert advice throughout the project.

Thanks also to Beth Hall who used our project as a case study for her MSc in information studies; some of her results are presented as a formal element of the project.

Posted in Final Project Report, Open Access | Tagged: , | Leave a Comment »

UniCycle website

Posted by Nick on May 20, 2009

There is now a project website for UniCycle.  Not much there yet as it’s Ning and we’re relying on this newfangled Web 2.0 to generate content – so if you’re interested in Open Educational Resources come and join us.

Posted in Link, Open Access, UniCycle project | Tagged: , , | Leave a Comment »

Repositories for research and teaching/learning material: The debate continues at #rpmeet

Posted by Nick on May 13, 2009

reprog

Last week I attended the JISC Repository and Preservation end of programme meeting in Birmingham. I recall being very nervous at my first JISC event in November 2007 but feel much more at ease now and enjoyed the event immensely; the programme has certainly been successful in fostering a sense of community though it’s an unusual social experience to meet people face to face, often for the very first time, when one feels you already know them from reading their blog and following them on Twitter.

During one of the breakout sessions on the first day I made a bee-line for a discussion about repositories for learning and teaching materials – as opposed to OA research repositories. I use the word “opposed” advisedly as there is certainly some strong sentiment around the issue, particularly with respect to using a common software platform. As a representative of a project that is adapting a learning object repository to also serve as an effective Open Access research repository I’m finding it a little difficult to understand the vehemence of some of this opposition, though I would be the first to acknowledge a steep learning curve and recognise that we have required extensive development, not of intraLibrary itself perhaps, but of an appropriate web infrastructure surrounding it. And yes, we would certainly have been able to implement a functioning OA research repository more quickly using EPrints or DSpace however, from the outset, it was vital that our repository had the the capacity to fulfil its broader potential – in the words of Clifford Lynch “[A] mature and fully realised institutional repository will contain the intellectual works of faculty and students – both research and teaching materials – and also documentation of the activities of the institution itself in the form of records of events and performance and of the ongoing intellectual life of the institution.”  [Lynch, Clifford. A “Institutional Repositories: Essential Infrastructure for Scholarship in the Digital AgeARL Bimonthly Report 226 (2003).]

It’s also important to be pragmatic.  Historically, Leeds Metropolitan University is a polytechnic that gained chartered university status in 1992; its heritage is very much in teaching and learning rather than research with, arguably, a more vocational than academic flavour.  In recent years, the research profile has steadily increased, culminating in unprecedented success in the 2008 RAE and the university is naturally keen to capitalise on this success, enhance its research profile further whilst also continuing to emphasise its student focussed teaching and learning credentials. The implementation of an integrated repository to support both research outputs and learning objects reflects this dual focus.  Clifford Lynch’s article suggests that the concept of a central system to manage disparate resources in this way has been implicit within the sector for some years, however, the technology has tended to focus on Open Access to research, with the two most widely used software platforms being EPrints, developed at the University of Southampton in 2000, and DSpace, developed at MIT in 2002; early versions of both platforms were primarily designed to manage text based resources (though subsequent versions of EPrints and DSpace can manage a wide range of digital file formats.)  

NB.  In an extended discussion on this issue on JISC-REPOSITORIES (archive hereRepositoryMan Les Carr of EPrints refers to the fact that he still comes across the firmly held (and spurious) belief that because EPrints is used for Open Access it can’t be used for multimedia files or scientific data.

The session was chaired by Amber Thomas of JISC and I asked a somewhat blunt, perhaps naive, question about JISC’s perspectives on combined repositories of research and teaching materials.  Amber suggested that JISC have been deliberately neutral on the issue which is also perhaps emphasised by the diagramatic representation of the programme structure reproduced above.  

Some of the commentators last Wednesday were adamant that though it may well be possible to manage different types of resources with a single system it was far from desirable with one colleague making the pithy analogous observation that you can write letters in Excel but that doesn’t make it right.  Phil Barker of CETIS was also at the discussion and in a recent blog post on the “question of whether research outputs and learning materials should stored in the same repository” is “inclined to think the answer is no, the purpose of the repository is different, a learning material isn’t an output, sharing means something different for the two resource types.”  Phil goes on to say that ” If you think a repository is a database and a bit-store then you may come to a different conclusion, but I think a repository is a service offered to people and your choice of starting point in offering that service will affect how easy your journey is.”  (Full post here)

I’d certainly concede that our journey hasn’t been an easy one and I also agree that a repository is a service offered to people and with our repository start-up, and also Streamline and PERSoNA, that is certainly the approach we have tried to take; with intraLibrary and the SRU interface we now have an incipient infrastructure to manage both research material and learning objects; the discrete types of material can be managed entirely separately, however, there is also potential for the ongoing development of a holistic approach to the management of the full range of digital resources produced by a modern university and as we develop our infrastructure further I hope we can utilise appropriate web-technology around a central management system (intraLibrary) to achieve decentralised resource discovery – through appropriate interfaces, widgets and environments – the VLE for example.

JISc-meeting09-poster

Then of course there is the small matter of persuading academics to part with their resources, not to mention IPR, copyright and quality control issues…

Open Access to research is an evolving paradigm and represents a considerable shift in the established academic publishing process; Open Access to a broader range of educational resources still more so. Any paradigm shift is likely to take time to evolve and Open Access, to research and other materials, is no exception, especially given that academia, perhaps, tends to subscribe rather strongly to established tradition!

JISC’s current OER programme should go some way to addressing many of these issues but infrastructure is the foundation. The perfect system almost certainly doesn’t exist and it’s surely important to be pragmatic when implementing and developing appropriate system. Here’s to ongoing discussion, debate and development.

Posted in A new era, Adapting intraLibrary, Event, Open Access, Resource discovery, Teaching and Learning, Which repository | 1 Comment »

Google indexing and SEO

Posted by Nick on April 22, 2009

It is crucial that both the Open Access full text research content of the repository and metadata records of citation material are fully indexed by Google (and other search engines); in the future it is also likely to be required for other Open Educational Resources (learning objects). However, site:http://repository-intralibrary.leedsmet.ac.uk/ currently returns just 4 results (in addition to the Login page itself) and it is a bit of a mystery how these 4 are actually being picked up when the majority of records are not.

In intraLibrary, for a given collection, the administrator may choose to:

• Allow published content in this collection to be searched by external systems

This effectively means SRU (Search and Retrieve by URL) a standard search protocol utilizing CQL (Common Query Language).

• Allow published records in this collection to be harvested by external systems

This effectively means harvest by OAI-PMH

XML Sitemaps

Intrallect have suggested that it is necessary to implement an XML sitemap to ensure that content is properly crawled by Google. Until 2008, Google did support sitemaps using OAI-PMH but have since withdrawn this and now support only the standard XML format. Intrallect have therefore developed a software tool that converts OAI-PMH output to an appropriate XML format. A sitemap has been generated and registered using Google’s webmaster tools but currently is registering a series of errors that indicate “This URL is not allowed for a Sitemap at this location”; 9 errors are listed from the very first URL and which are sequential; it seems that the crawl does not go any further and none of the 100+ URLs in the sitemap have been successfully recognised. Two possible reasons have been suggested for this:

• All of the URLs in the sitemap are external; it may be that Google does not permit URLs outside the mapped domain.
• There is a problem with the XML itself

Sitemap here: http://repository-intralibrary.leedsmet.ac.uk/sitemap/Sitemap.xml

Sitemaps using RSS

It is also possible to submit a sitemap based on RSS, however, this approach has not been any more successful as the Open URL/virtual file paths generated by intraLibrary are inaccessible to Google resulting in the following warning:

URLs not followed
When we tested a sample of URLs from your Sitemap, we found that some URLs redirect to other locations. We recommend that your Sitemap contain URLs that point to the final destination (the redirect target) instead of redirecting to another URL.

Google and SRU

Though SRU does not facilitate indexing by Google per se, the integration of the SRU Open Search interface may provide a potential solution. site:http://repository.leedsmet.ac.uk/ currently returns 247 records; largely these appear to represent Googlebot following the various browse links (many of which themselves return no results where there is no content to find!) In addition, Googlebot appears to be following hyperlinked author names, publisher and subject(s) in the individual metadata records:

google

The third of these “The Repository search for Morton, Veronica” links to the two metadata records associated with that name as though it had simply been entered into http://repository.leedsmet.ac.uk/ as a search term:

http://repository.leedsmet.ac.uk/main/search.php?q=Morton%2C+Veronica+

Presumably these records were initially indexed via the appropriate links on the browse interface – http://repository.leedsmet.ac.uk/main/browse.phpFaculty of Health and R – Medicine and then re-indexed via the hyperlinks embedded in the metadata records. It is interesting to note that, though Morton, Veronica only has two records associated with her name, this record appears relatively high – at the top of the second page – and this is probably because there so many other authors also associated with these papers; all of these names are hyperlinked giving over 21 separate indexable links.

It seems that we might need to formalise the structure of the SRU to ensure it is optimised for Google; possibly with some sort of SRU sitemap. For example, if we could generate a page that linked to all the individual metadata records in the repository and optimise this page to be crawled by search engine spiders (doesn’t need to be human readable; could be XML) which could then follow the links to the associated metadata.

It also seems to me that Search Engine Optimisation will need to comprise appropriate customisation of the SRU interface; for example, we want to facilitate browse by author which, in turn, will provide indexable links for Googlebot.

Full text indexing

There is also the issue of indexing full text. As already mentioned, Google does not follow the Open URL/virtual file paths generated by intraLibrary and all the results from site:http://repository.leedsmet.ac.uk/ are search results. Potentially this is a benefit in as much as people are less likely to bypass the metadata record and go directly to the PDF but we do also want to facilitate full text indexing. We may have to wait for Intrallect on this who have assured us they are looking into facilitating full text indexing – probably via intraLibrary itself rather than the SRU.

Posted in A new era, Open Access, Resource discovery | 5 Comments »

Paying for open access publication charges – RIN guidelines

Posted by Nick on March 31, 2009

“The Research Information Network and Universities UK have produced a guide (March 2009) to provide advice on paying open access publication charges: that is, fees levied by some journals for the publication of scholarly articles so that they can be made available free of charge to readers, immediately upon publication. The guide also sets out recommendations for universities and other research institutions, publishers, research funders, and authors.”

http://www.rin.ac.uk/openaccess-payment-fees

Posted in Link, Open Access | Tagged: , | 1 Comment »