What would you like to search for – research or OER?

Leeds Met Open Search – http://repository.leedsmet.ac.uk/main/index.php – now incorporates a “splash screen” that allows the user to choose which collection they wish to search with links that provide access to separate interfaces that are tailored to each type of material:

Leeds met Open Search splash page

Each tabbed interface provides an appropriate Advanced search form as well as relevant browse options; by LCSH or faculty for research and by HEA Subject Centres or JACS code for OERs:

Once again, massive thanks to Mike for his rapid response to the the myriad requests I make of him on a daily basis!

Leeds Met Repository Open Search Version 2.0

This is a bit of a trailer for our shiny new interface that I hope will go live in the next week or so and a run down of some of the new features.

It’s far from perfect and should still be seen as a beta – we very much need real users to start using it and I’m feeling a little nervous about how it will be received as I know how much work Mike, in particular, has put into it.

The interface has evolved from an SRU client developed for by IRISS – http://www.iriss.org.uk/learnx – which is available under GNU General Public Licence v.3 at http://code.google.com/p/sruopensearch/ (N.B.  We still intend to release our modified code under a similar licence.)  Learning Exchange Open Search is a great front end for searching intraLibrary but with just a simple search box lacked advanced search functionality that was essential for us.  We also wanted to use intraLibrary to manage resources for teaching & learning aswell as facilitating Open Access to our research collection in accordance with the EPrints model.

The tabbed interface incorporates an “Advanced search” form that allows users to cross reference multiple fields specifying AND/OR and they are also able to search for either “Research” or “Open Educational Resources” which uses authentication tokens to return results from the appropriate collections in intraLibrary:

advanced

There are also big changes in the way that results are returned; Mike has been able to use a unique identifier to build individual pages for each record so that a search will return a set of results that indicates whether or not each individual record has the full text available:

repository

These titles then link through to a static HTML page comprising all of the metadata associated with that record including a published URL and, where the full text is available, a link to the PDF in intraLibrary:

static

This static page should be indexed more effectively than was the case before though there is one small fly left in the ointment in that the public URL generated by intraLibrary that is used to download the full text is dynamic which means it cannot be indexed by Google; I’m not sure if it will be possible for Intrallect to do anything about this though they are aware of the need for full text indexing and are looking into the problem.

Separate HTML pages for individual records

I’m returning here to an old theme that is still nagging away at the back of my mind and that I think still needs exploring further as the functionality of the SRU interface develops; both by Mike and I and by Intrallect in the context of their ongoing development of the research repository aspect of intraLibrary.

Can we generate individual HTML pages for records such that a search query could generate a list of hyperlinks that point to those individual pages rather than to the location URL stored in intraLibrary which is currently the case?  This would more closely approximate the way that EPrints and DSpace work and potentially solve the Google problem by providing an easily indexable page of static HTML for search engine spiders to crawl.  Could these pages also have nice, short, human readable URLs instead of convoluted search strings / machine-generated public URLs from intraLibrary.  Again more like EPrints/DSpace.  Currently the only way I can give a link to an item is:

http://repository.leedsmet.ac.uk/main/search.php?q=promoting+open+access+to+research&x=22&y=26&exacttext=1

(The SRU search string that will provide the metadata)

Or

http://repository-intralibrary.leedsmet.ac.uk/IntraLibrary?command=open-preview&learning_object_key=i05n27905t

(The machine generated public URL for the actual PDF)

I’ve recently been adding RSS feeds to http://repos-dev.leedsmet.ac.uk/main/browse.php and another issue (aside from the fact that the wrong field is exposed by RSS) is that these also point to the location URL stored in intraLibrary – the PDF in the case of full text but the published URL in instances where there is a citation only.  It would be much better if these feeds could point at a Leeds Met repository metadata record.

I simply do not have the technical insight to know whether any of this is achievable at all and, if it is, how big a job it will be.

Spinning plates: A repository update

I feel at the moment that I’m trying to spin plates, rushing around like a guest act on the Paul Daniels magic show to impart sustained rotation – when I get to the last plate the first is beginning to wobble….and then someone throws me another, just to make it more exciting.

Oh and all the plates are different shapes and sizes (some of them I swear aren’t even symmetrical which gives them a lop-sided gyration that is really tricky to maintain…)

Currently research material collectively comprises one of the biggest and wonkiest sets of crockery:

  1. Peer reviewed journal articles
  2. Book items
  3. Conference proceedings
  4. Conference items (e.g. ppt presentations)
  5. Theses or dissertations
  6. Reports

All of these need to be displayed differently by the Open Search interface which means Mike is currently hacking away at the code to ensure they are picked up by Educational properties: Type of resource and then formatted appropriately.  In turn this is having a knock on effect for metadata entry and workflow and I’m still not sure how it will all tie together.

We are also relying on Mike to develop advanced search as part of the interface; work is progressing well and we are now able to cross reference (toggling AND/OR) Title; Subject; Publisher; Description (abstract); DOI; Type and Format  – the interface also has a free text box.  Mike is currently trying to implement probably the most important field – search by author – and has identified a potential problem due to limits on the way we can differentiate contributor roles in the metadata – currently we are only able to query Dublin Core rather than the full LOM record.  N.B. Recent testing has indicated that this may not actually be the problem we originally thought and it may be possible to query contributor role=author after all.  Then we’d also like to incorporate browse by author – though this is currently another plate yet to be balanced and spun….

Then there is the ongoing issue of differentiating content (i.e. research material vs. learning objects) and ensuring these are returned appropriately by an extended/alternative Open Search interface – this functionality is crucial to Unicycle of course to make OERs available.  We’re currently exploring using collection tokens which should allow us to submit a query incorporating an authenticationToken such that a given query only returns content from a particular collection.  Initial testing has gone well and we’re pretty confident we can implement this over the next week or so.

(NB.  I need to start thinking about license models for Unicycle and some flavour of Creative Commons will need uploading to intraLibrary, another one for the to do list…)

Ideally we could do with a functioning PowerLink for the VLE by September – as well as being crucial for the PC3 project, it represents functionality that will really get people engaged and allow us to demonstrate the benefits of storing and sharing teaching and learning materials in the repository rather than within the modular, inaccessible silos of Blackboard-Vista.  Also important for PC3 is implementing LDAP authentication (which really should be happening soon!) thereby giving teaching-staff – and the first PC3 cohort – access to intraLibrary – NB.  PC3 doesn’t require Open Access in the same way as Unicycle.

I’ve also been working with Rachel on developing a workflow for CLA material and ensuring that we can generate suitable reports for the CLA – during this process, we uncovered a bug in the metadata editor which slowed us down a bit but with help from Intrallect, we’ve managed to implement a work-around pending the bug being fixed in a future build and Rachel has started using intraLibrary to store and disseminate CLA material on a pilot basis.

The most recent plate JISC have thrown our way, of course, is the Bibliosight project which will almost certainly have an impact on the developing infrastructure beyond the specific deliverables of that project – our first meeting is on Monday 13th July for which there is a draft agenda on the project blog.

I just hope that we can keep all the plates spinning and don’t end up with a Greek wedding scenario!

Development of Research Repository Aspect of IntraLibrary

On Friday Mike and I visited colleagues at Keele University for a meeting with Charles Duncan from Intrallect to consider development priorities for intraLibrary to better serve our needs as a research repository.  Over 4 and a half hours we considered the basic issues that need addressing as well as looking forward to some more ambitious functionality and integration with the wider research infrastructure as we move towards the REF.

I was particularly interested to learn about how Keele are implementing Symplectic’s publications management system – http://www.symplectic.co.uk/ – which regularly trawls Web of Science and PubMed central for information about Keele’s academic publications.  Symplectic have clearly been thinking about integration with IRs and there’s even a link to SHERPA/RoMEO.  The system was used at Imperial College London for the RAE 2008 process and includes link functionality with DSpace which is that institution’s IR platform – http://spiral.imperial.ac.uk/.  Intrallect are currently liaising with Symplectic about integration with intralibrary – I’m not certain precisely what form this would take but in an ideal world it would be great if we could auto populate as much metadata as possible (title/bibliographic info/abstract/author/copyright status according to RoMEO) and automatically nudge academics for full text where appropriate!

At Leeds Met we currently lack any form of research database which is why I’ve been exploring what are essentially manual workflows to populate the repository with all research output – I’m not sure how expensive Symplectic is and it may be difficult to justify given this institution’s relatively small research output and the repository may well have to be the research database which is the assumption I’ve been working on; we will also want to explore the soon-to-be-released Web of Science API which may, in any case, enable us to emulate some of this functionality ourselves.

The first item on our agenda was somewhat more prosaic and focussed on our immediate functional requirements – SRU searching and metadata.  Mike has been working on incorporating advanced search into the SRU interface and come up against a couple of issues when searching by author and date which are essentially artifacts of having to query DC rather than LOM; in the LOM, creators and contributors are clearly differentiated, however, querying by DC conflates creator and author roles which may (will) be different if resources are uploaded by someone other than the author.

  • Searching dc.creator will search for the creator and author roles
  • Searching dc.contributor will search for the content provider role

In addition:

  • Searching by dc.date only searches data that relates to the intraLibrary submission process (i.e. the deposit date, and perhaps modification dates if you added an author later on for example)
  • The only way to search journal dates is to use the default free text search that searches everything (or most fields anyway).

The solution, of course, is to make it possible to query the LOM by SRU and this is now Intrallect’s intention – indeed, to render all LOM fields query-able which would include user generated tags for example.

The next big question is exposure of open content to search engines and Charles gave us an overview of plans to develop an object “home page” with a static URL which should help in this area.  We also discussed sitemaps and what need to be done external to intraLibrary.  I’m still unclear on how we can improve the format of results returned by Google from the SRU interface; to repeat, Google IS indexing http://repository.leedsmet.ac.uk/ with site: http://repository.leedsmet.ac.uk/ currently returning over 500 records.  However this is fairly unstructured; Google is simply following links from http://repository.leedsmet.ac.uk/main/browse.php; any subsequent links Googlebot encounters are also indexed and returned as “The Repository search for [link name]” and ideally I’d like results to be returned in a more structured and user friendly form.   Many queries actually return no results where there is (yet) no content to find though where there is content, Google is indexing all human readable metadata.  I’m also not certain whether Googlebot is finding its way into the full text via the Open URL/virtual file paths generated by intraLibrary.  Full text indexing within intraLibrary itself has also been promised.

In short, I’m really not sure how all of these factors may combine to be exploited by a next generation SRU interface!

We then touched upon self-archiving and (semi) mediated workflows; potentially developing SWORD based quick deposit from desktop/web, ideally with automatic metadata generation.

The two other major issues we considered are:

  • Policy metadata – handling embargoes

This is pretty crucial to an OA archive of research as many publishers of academic journals specify an embargo period of 12 or 18 months from the date of publication before a paper can be made available in a repository.  We need to be able to add a paper to intraLibrary upon receipt but restrict access until the embargo has expired and for this to happen automatically.  On one level, this functionality should be fairly straightforward to achieve by having intraLibrary check today’s date against an embargo date specified in the metadata; it’s a little more complicated than that though as we would want the metadata to be visible before the embargo date, just not the full text.

  • Cover pages for PDF

It was suggested that a coversheet should be generated by intraLibrary on the fly which would certainly be useful as manually creating cover sheets for each and every article is time consuming to say the least; this would be useful functionality for CLA materials which also require a coversheet.

These developments will take some time to implement and the next stage is to prioritise – by anonymous e-postal ballot – Intrallect hope we will start to see some of the major initiatives in a build towards the end of the year.

Thank you to our colleagues at Keele for making us welcome and for feeding us!

Follow

Get every new post delivered to your Inbox.