Repository News

Implementing an Institutional Repository for Leeds Metropolitan University

Posts Tagged ‘SRU’

Leeds Met Repository Open Search Version 2.0

Posted by Nick on November 9, 2009

This is a bit of a trailer for our shiny new interface that I hope will go live in the next week or so and a run down of some of the new features.

It’s far from perfect and should still be seen as a beta – we very much need real users to start using it and I’m feeling a little nervous about how it will be received as I know how much work Mike, in particular, has put into it.

The interface has evolved from an SRU client developed for by IRISS – http://www.iriss.org.uk/learnx – which is available under GNU General Public Licence v.3 at http://code.google.com/p/sruopensearch/ (N.B.  We still intend to release our modified code under a similar licence.)  Learning Exchange Open Search is a great front end for searching intraLibrary but with just a simple search box lacked advanced search functionality that was essential for us.  We also wanted to use intraLibrary to manage resources for teaching & learning aswell as facilitating Open Access to our research collection in accordance with the EPrints model.

The tabbed interface incorporates an “Advanced search” form that allows users to cross reference multiple fields specifying AND/OR and they are also able to search for either “Research” or “Open Educational Resources” which uses authentication tokens to return results from the appropriate collections in intraLibrary:

advanced

There are also big changes in the way that results are returned; Mike has been able to use a unique identifier to build individual pages for each record so that a search will return a set of results that indicates whether or not each individual record has the full text available:

repository

These titles then link through to a static HTML page comprising all of the metadata associated with that record including a published URL and, where the full text is available, a link to the PDF in intraLibrary:

static

This static page should be indexed more effectively than was the case before though there is one small fly left in the ointment in that the public URL generated by intraLibrary that is used to download the full text is dynamic which means it cannot be indexed by Google; I’m not sure if it will be possible for Intrallect to do anything about this though they are aware of the need for full text indexing and are looking into the problem.

Posted in Adapting intraLibrary, Open Search V2.0 | Tagged: , , , , | Leave a Comment »

Separate HTML pages for individual records

Posted by Nick on July 17, 2009

I’m returning here to an old theme that is still nagging away at the back of my mind and that I think still needs exploring further as the functionality of the SRU interface develops; both by Mike and I and by Intrallect in the context of their ongoing development of the research repository aspect of intraLibrary.

Can we generate individual HTML pages for records such that a search query could generate a list of hyperlinks that point to those individual pages rather than to the location URL stored in intraLibrary which is currently the case?  This would more closely approximate the way that EPrints and DSpace work and potentially solve the Google problem by providing an easily indexable page of static HTML for search engine spiders to crawl.  Could these pages also have nice, short, human readable URLs instead of convoluted search strings / machine-generated public URLs from intraLibrary.  Again more like EPrints/DSpace.  Currently the only way I can give a link to an item is:

http://repository.leedsmet.ac.uk/main/search.php?q=promoting+open+access+to+research&x=22&y=26&exacttext=1

(The SRU search string that will provide the metadata)

Or

http://repository-intralibrary.leedsmet.ac.uk/IntraLibrary?command=open-preview&learning_object_key=i05n27905t

(The machine generated public URL for the actual PDF)

I’ve recently been adding RSS feeds to http://repos-dev.leedsmet.ac.uk/main/browse.php and another issue (aside from the fact that the wrong field is exposed by RSS) is that these also point to the location URL stored in intraLibrary – the PDF in the case of full text but the published URL in instances where there is a citation only.  It would be much better if these feeds could point at a Leeds Met repository metadata record.

I simply do not have the technical insight to know whether any of this is achievable at all and, if it is, how big a job it will be.

Posted in Adapting intraLibrary | Tagged: , , , , | 6 Comments »

Open Educational Resources Programme start-up meeting: What I learned

Posted by Nick on June 11, 2009

I very much enjoyed the OER programme start-up meeting on Tuesday, in spite of the 05:30 alarm and having to hoof it across Manchester on account of ‘improvements’ to the Metrolink.  I recognised several colleagues from other JISC programmes and was socially disorientated once more by the 21st Century experience of finally  meeting f2f with real people with whom I’m already well acquainted in cyber-space – more so now than ever with fellow Twitterers.

Projects in the programme are divided into 3 discrete strands: subject; individual and institutional.   In the institutional strand, UniCycle will aim to build a prototype mechanism for the import and export of OERs using our intraLibrary repository and the new JorumOpen service.  Other projects in this strand are BERLiN, Open Exeter, OpenStaffs, Otter, Open Spires and Open Content Employability Project (link?).

The agenda for the day can be viewed at http://cloudworks.ac.uk/node/1725 along with aggregated tweets tagged #oerstartup ; Cloudworks is an environment that I haven’t encountered before but it looks very useful and I intend to explore it further – it was described to us as a way of making transient events more persistent and of bringing our fragmented online communications back together.

Like many on the day I was looking forward to the presentation from Jorum to learn exactly how that service is evolving to facilitate the OER programme.  I have a particular interest, of course, as we also use intraLibrary as our repository platform and Unicycle will aim to disseminate OERs via both our own and the national service.  The experience of Jorum and the problems they have had persuading folk to sign their institution up to their extensive licence agreement, become registered users and deposit their learning resources in intraLibrary – from where they can only be discovered and reused by other registered users – has been instructional for us and I am also aware, first hand, of the training required to use intraLibrary – an undeniably powerful system albeit where flexibility can perhaps translate to complexity for the user.  In short, I was keen to discover how they plan to tackle these issues with the introduction of their three licence model and by facilitating easy deposit and (where appropriate) open access to LOs.

Current Jorum model

Current Jorum model

In her presentation (available here), Nicola Siminson first gave an overview of Jorum and JorumOpen; how the current model (illustrated above), is developing and the technical and policy initiatives that will underpin this development.

The 3 new licensing regimes are key:

  • JorumOpen – for content whose creators and owners are willing and able to share their materials for anyone to use via the web, under Creative Commons (CC) licences
  • JorumEducationUK – for content sharing where creators and owners need to restrict the availability of resources to members of UK Further and Higher Education institutions, authenticated via the Access Management Federation (this is most similar to the current licence)
  • JorumPlus – for sharing content with additional restrictions, for example where material licensed via JISC Collections or from third parties is involved; this will require institutional authorisation

Work on the platform is ongoing and we were promised that:

  • access will be open to anyone
  • materials will be more discoverable – e.g. Google – JorumOpen will be exposed to search engines
  • users will be able to search the whole Jorum repository via the website – no logging on to download

These are all issues that we have also been exploring and I expect that Jorum will need to develop an interface based on SRU similar to that developed by IRISS and our own research interface.  It would be very useful too if we can compare notes on facilitating effective Google search/SEO.

Then came the demonstration of the OER deposit tool – http://deposit.jorum.ac.uk – which:

  • allows the deposit of a simple item, or collection of items
  • a link/URL to an open educational resource from a remote site
  • authenticated access and a simple one-off registration
  • UK Access Management Federation – single sign-on at home institution
  • upload content, submit basic metadata and select a suitable Creative Commons licence
  • with option to add more metadata, for greater discoverability…and will ultimately enable the sharing and finding of OER via JorumOpen!

It looks good.  Albeit in beta.  Jorum are keen for the community to test it over the coming months and submit any feedback from the website.

I asked whether the software/code will be made available so we may implement a similar tool as part of our repository infrastructure at Leeds Met; in addition, as Unicycle will use both our own repository and Jorum to disseminate OERs, I would also like to explore dual deposit from a web based interface so users may deposit into both repositories simultaneously.  As such I would also be interested in the workflow(s) and metadata templates that Jorum are using with the deposit tool. Will resources be published directly to the library, for example, or will they go into a user’s work area or into an administrative work area for metadata enrichment?

I was advised that the software will indeed be available to other projects though not in a neatly packaged format.

NB.  I had assumed that the deposit tool was based on SWORD which I know does facilitate deposit into multiple repositories – it appears, however, that it is actually based on MrCute which does not, in fact, use the SWORD protocol so this will need further exploration.

Finally delegates were urged to join the Jorum community – http://community.jorum.ac.uk/

Other useful presentations throughout the day included Project Management
Evaluation and Synthesis project
, OU-supported communities and OER infokit

(links to all presentations in one place at http://www.jisc.ac.uk/whatwedo/programmes/oer/startupmeeting090609.aspx)

And then, on the way back to Euston, I popped in the British museum and admired bits of the Parthenon and some Sarcophagi (Sarcophaguses?)

Posted in Event, Open Educational Resources, UniCycle project | Tagged: , , , , , , , | 2 Comments »

Development of Research Repository Aspect of IntraLibrary

Posted by Nick on June 1, 2009

On Friday Mike and I visited colleagues at Keele University for a meeting with Charles Duncan from Intrallect to consider development priorities for intraLibrary to better serve our needs as a research repository.  Over 4 and a half hours we considered the basic issues that need addressing as well as looking forward to some more ambitious functionality and integration with the wider research infrastructure as we move towards the REF.

I was particularly interested to learn about how Keele are implementing Symplectic’s publications management system – http://www.symplectic.co.uk/ – which regularly trawls Web of Science and PubMed central for information about Keele’s academic publications.  Symplectic have clearly been thinking about integration with IRs and there’s even a link to SHERPA/RoMEO.  The system was used at Imperial College London for the RAE 2008 process and includes link functionality with DSpace which is that institution’s IR platform – http://spiral.imperial.ac.uk/.  Intrallect are currently liaising with Symplectic about integration with intralibrary – I’m not certain precisely what form this would take but in an ideal world it would be great if we could auto populate as much metadata as possible (title/bibliographic info/abstract/author/copyright status according to RoMEO) and automatically nudge academics for full text where appropriate!

At Leeds Met we currently lack any form of research database which is why I’ve been exploring what are essentially manual workflows to populate the repository with all research output – I’m not sure how expensive Symplectic is and it may be difficult to justify given this institution’s relatively small research output and the repository may well have to be the research database which is the assumption I’ve been working on; we will also want to explore the soon-to-be-released Web of Science API which may, in any case, enable us to emulate some of this functionality ourselves.

The first item on our agenda was somewhat more prosaic and focussed on our immediate functional requirements – SRU searching and metadata.  Mike has been working on incorporating advanced search into the SRU interface and come up against a couple of issues when searching by author and date which are essentially artifacts of having to query DC rather than LOM; in the LOM, creators and contributors are clearly differentiated, however, querying by DC conflates creator and author roles which may (will) be different if resources are uploaded by someone other than the author.

  • Searching dc.creator will search for the creator and author roles
  • Searching dc.contributor will search for the content provider role

In addition:

  • Searching by dc.date only searches data that relates to the intraLibrary submission process (i.e. the deposit date, and perhaps modification dates if you added an author later on for example)
  • The only way to search journal dates is to use the default free text search that searches everything (or most fields anyway).

The solution, of course, is to make it possible to query the LOM by SRU and this is now Intrallect’s intention – indeed, to render all LOM fields query-able which would include user generated tags for example.

The next big question is exposure of open content to search engines and Charles gave us an overview of plans to develop an object “home page” with a static URL which should help in this area.  We also discussed sitemaps and what need to be done external to intraLibrary.  I’m still unclear on how we can improve the format of results returned by Google from the SRU interface; to repeat, Google IS indexing http://repository.leedsmet.ac.uk/ with site: http://repository.leedsmet.ac.uk/ currently returning over 500 records.  However this is fairly unstructured; Google is simply following links from http://repository.leedsmet.ac.uk/main/browse.php; any subsequent links Googlebot encounters are also indexed and returned as “The Repository search for [link name]” and ideally I’d like results to be returned in a more structured and user friendly form.   Many queries actually return no results where there is (yet) no content to find though where there is content, Google is indexing all human readable metadata.  I’m also not certain whether Googlebot is finding its way into the full text via the Open URL/virtual file paths generated by intraLibrary.  Full text indexing within intraLibrary itself has also been promised.

In short, I’m really not sure how all of these factors may combine to be exploited by a next generation SRU interface!

We then touched upon self-archiving and (semi) mediated workflows; potentially developing SWORD based quick deposit from desktop/web, ideally with automatic metadata generation.

The two other major issues we considered are:

  • Policy metadata – handling embargoes

This is pretty crucial to an OA archive of research as many publishers of academic journals specify an embargo period of 12 or 18 months from the date of publication before a paper can be made available in a repository.  We need to be able to add a paper to intraLibrary upon receipt but restrict access until the embargo has expired and for this to happen automatically.  On one level, this functionality should be fairly straightforward to achieve by having intraLibrary check today’s date against an embargo date specified in the metadata; it’s a little more complicated than that though as we would want the metadata to be visible before the embargo date, just not the full text.

  • Cover pages for PDF

It was suggested that a coversheet should be generated by intraLibrary on the fly which would certainly be useful as manually creating cover sheets for each and every article is time consuming to say the least; this would be useful functionality for CLA materials which also require a coversheet.

These developments will take some time to implement and the next stage is to prioritise – by anonymous e-postal ballot – Intrallect hope we will start to see some of the major initiatives in a build towards the end of the year.

Thank you to our colleagues at Keele for making us welcome and for feeding us!

Posted in Adapting intraLibrary, Open Access | Tagged: , , , , , , , , , | 3 Comments »

Repository Steering Group meeting: 22nd July 2008

Posted by Nick on July 23, 2008

The staff development festival in September is a unique opportunity to promote the repository and our agenda for yesterday’s meeting aimed to get some much needed input from the steering group before the quiet month of August.

Item 1. Recap of previous meetings:

Documentation approved.

Item 2. Update on progress with intraLibrary

2a. Configuration:

Search interface (SRU):

Getting the search interface on line is the first priority – my request for the server is still pending with IMTS but I hope we can install the IRISS interface as is within the next few weeks (JohnG is installing it on a local server as we speak which can then be tranferred to our Leeds Met domain when it is available) and I think it will be straightforward to switch the CSS to get a very rough Leeds Met branding.

Content structure:

This is also crucial and needs to be put in place ASAP. Several members of the group expressed the opinion that it should not be based on faculties which tend not to be fixed entities within the university; it was also thought that such a schema would not reflect institutional emphasis upon cross-disciplinary research. There was consensus that organisation at the top level should be by content type (i.e. Research/Learning Objects) but exactly what hierarchy should be employed beneath is still not clear (library of congress subject headings?). We also need to make a decision on what other material types will be accomodated in the prototype (e.g. Dissertations and Theses)

Landing screen:

Technical challenges aside, the current conception of the landing screen is that it will essentially use the same template as the search interface i.e. it will be branded the same and share the same look and feel; it will also share some of the same functionality and link back ‘home’ to the search interface.

Given the close relationship between these configuration issues, a sub-group was identified that will liaise as necessary to develop the content structure; branding; look and feel; usability and will also inform the technical development of the additional functionality.

2b. Policies:

The group was briefed on the types of policies that need to be developed (see last post) with emphasis on the fact that the ’standard’ institutional repository policies may be insufficient for our requirements given our wider remit (i.e. not just research outputs). A sub-group was identified that will liaise as necessary to develop suitable policies.

2c. URL:

The suggestion mooted – repository.leedsmet.ac.uk – was deemed suitable by the group

Item 3. Content for the repository:

To discuss method of contacting researchers / research active staff and soliciting content

Review of draft correspondence for research active staff and discussion of when this would most usefully be disseminated; consensus that it would have the greatest impact some time after the staff development festival. Content was broadly approved though it was suggested that greater emphasis be placed on the benefits of OA to citation and the increased importance of citation under proposals for REF (to replace RAE).

Emphasis was placed on the need to identify and recruit interested parties within specific faculties/research groups to help drive the advocacy process to the wider community; liaison with University Research Office for appropriate contact lists.

(NB. This is an ongoing process that is already underway but will increase in profile with the implementation of the prototype system.)

The Staff development festival confirmed as a key opportunity.

There was discussion whether content would be full text only or would also comprise citation of material that we do not have copyright permission to make available as full text (i.e. bibliographic reference only). Given that including such material will enable us to ‘hit the ground running’ and considering the increasing importance of citation data/bibliometrics for the RAE / REF the consensus was that citations should be included at the outset.

Item 4. Authentication

It was emphasised to the group that we can be fully functional as a mediated repository without the need for authentication in the first instance.

A representative from IMTS was able to inform the discussion in the light of recent feedback from Intrallect and will continue to liaise as necessary.

Item 5. Integration with other Leeds Met systems

In light of the decision to include citations as well as full text, an important early integration will be with SFX such that citations in the repository can incorporate a link to Leeds Met holdings of subscribed material; hardly Open Access as it will only be available to authenticated staff and students but will offer another local route to that material and can also be used to generate data on OA friendly publishers and perhaps to raise awareness of OA.

The PowerLink to X-stream should also be a priority such that it is operational at the earliest opportunity.

NB. Precise functionality of the PowerLink still needs to be determined.

Other systems flagged up for integration were iTunesU and the streaming server; pending investigation!

The next meeting of the steering group will take place after the staff development festival, probably late September/early October.

Posted in Steering group | Tagged: , , , , , , , , , , , , , | 3 Comments »

Search interface, URLs, taxonomy, policies and content…

Posted by Nick on July 21, 2008

It is now established that we will be using the SRU interface developed by IRISS as the public search interface for the repository. I hope to install the current incarnation of the interface on a Leeds Met server very soon and two of my more technically adept colleagues are looking at the recently released code in order to scope the extent of the development work that will be required to incorporate advanced search and browse functionality. As this page will effectively be the repository by proxy (the URL that I have requested is repository.leedsmet.ac.uk – intraLibrary itself will require a different URL) we also need to think about what other elements it might need to comprise; authenticated log-in to intraLibrary itself (yet to be determined if this will be the appropriate route for self-archiving; it will certainly be one route but we may also need an authenticated link to a SWORD interface for example); About this repository; FAQs; Operational policies; Contact etc. It is also likely that this page will form the basis of – or at least link to – the PERSoNA web-tool(s).

What about learning objects which will require their own taxonomy and a different workflow for deposit (via SWORD perhaps)? Should they be incorporated into the search interface at all or will users need to authenticate into intraLibrary to browse? This would seem to make sense given intraLibrary is a specialised LO repository and access to this type of content is more likely to be restricted to Leeds Met staff.

I’ve adapted my schematic recently posted on PERSoNA News to try to represent what the repository might now look like:

The customisation of the search interface is one of the issues that I am taking to the steering group meeting tomorrow afternoon.

Other decision that needs to be ratified by that group are:

  • The URL for the search interface
  • The URL for intraLibrary
  • The taxonomy system that we shall use within intraLibrary and that the search interface (SRU) will map directly on to (at least for research)

Other items on the agenda are:

  • Development of operational policies for the repository

I have so far drafted the following:

  1. Metadata policy
  2. Data policy
  3. Takedown policy
  4. Content policy
  5. Submission policy
  6. Preservation policy

These are all fairly standard in terms of Open Access repositories and, with the exception of 3. Takedown policy, were all generated using the OpenDOAR Policies Tool, nevertheless, it may be necessary to identify specialised sub-groups to review these drafts to ensure they are appropriate for the Leeds Met repository; the issue is more complex of course due to our repository incorporating Learning Objects as well as research.

  • Content for the repository

There needs to be a discussion about how best to contact researchers and research ac tive staff to ask them for appropriate material for the repository. In the first instance, in line with the project plan, this will be their own versions of published research articles that are allowed to be self-archived into an OA repository. I have begun to identify such material and have drafted correspondence for review at the meeting.

  • Authentication

With the implementation of the search interface (SRU) it will not be necessary to authenticate in order to browse for research content (essential for OA). It will, however, be necessary to generate authenticated accounts for Leeds Met staff that require access to intraLibrary itself and these will need to be integrated with LDAP. Though much will depend on the precise configuration of our integrated repository systems it is likely that, in time, all staff will require an authenticated account whether to deposit material, search for learning objects or access their internal workspace. There are also authentication issues pertaining to the potential use of SWORD/other external interfaces such that only authorised Leeds Met staff/students can deposit material/access federated content. I am still unsure of some of the issues involved and require input from Intrallect and IMTS.

  • Integration with other Leeds Met systems

This is an area where it is perhaps still too early to think much beyond priorities and broad timescales. Given that there is already a plug-in for X-stream and that this is functionality that can be used as a selling point to the university community it makes sense to focus on this integration first. Also, perhaps, library online and the portal.

Posted in Adapting intraLibrary, PERSoNA, Steering group | Tagged: , , , | 2 Comments »

Adapting intraLibrary

Posted by Nick on July 16, 2008

intraLibrary is designed as a learning object repository and it is only now becoming clear just what is involved so that the platform will also function as an Open Access repository of research.

Access to learning objects is generally federated. For example, in order to access resources in JORUM it is neccessary to authenticate via Athens (soon to be Shibboleth) or by a UK Access Management Federation log-in mechanism and, so far as I know, it is not possible to search the repository externally via a search engine. As the very point of an Open Access repository is to make research discoverable and accessible on the public internet this is obviously not desirable! It is, I think, relatively straightforward to expose metadata out to search engines via the OAI-PMH but the majority of search engines no longer support the protocol and we really need to allow the full text to be crawled by Googlebot and other search engine spiders which, I suspect, will not be able to get past the authentication gateway (need more info on this). Moreover, if an external user does come to the repository via Google it will not be possible for them to search content without first authenticating into the system – not very open. Notwithstanding the fact that about 80% of traffic comes to a repository via search engines (assuming they can index content in the first place) we obviously also want an accessible search interface aswell.

The potential solution to these problems that I am currently investigating is to use a seperate, web-based SRU interface which sits outside the repository and is accessible on the public internet.

As part of the CD-LOR project Intrallect have already developed a basic SRU interface which, in turn, has been substantially improved by a third party – IRISS interface here – who have made the code available under an open source licence. The IRISS interface is still fairly basic and does not incorporate all of the functionality that we require – it is essentially a search box only and, for example, would not facilitate browsing the research collection by faculty. It should be reasonably straightforward to customise the interface to incorporate the functionality that we require; we essentially need a series of hyperlinks that map onto the internal repository structure and that will return the appropriate queries. I also need to clarify if such an approach will enable Googlebot and other search engine spiders to crawl the full text thus making the content searchable on the open web.

For each object, intraLibrary generates a public URL which can be linked to directly – on the open web and with no need for authentication. However, a further issue is that, due to the way that intraLibrary works, a query return (either from a search engine or the SRU interface) will link directly to the resource itself – i.e. a PDF of a research article will open immediately in the browser window. When facilitating Open Access to research this is undesirable for several reasons and we require some sort of “landing screen” that can provide context and basic information (abstract, copyright info, whether the paper has been refereed); indeed, there will often be a legal requirement to provide copyright information with many publishers also stipulating that there must also be a link to the published version of the paper. Precisely how we will resolve this issue is yet to be determined; it might be possible to embed a link to the PDF into some sort of HTML template and have this template returned at the public URL?

Watch this space…

By working closely with Intrallect and with a little ingenuity I am confident that these issues will be resolved and that we have, in intraLibrary, an excellent solution to our diverse needs.

Posted in Adapting intraLibrary | Tagged: , , | 2 Comments »