What would you like to search for – research or OER?

Leeds Met Open Search – http://repository.leedsmet.ac.uk/main/index.php – now incorporates a “splash screen” that allows the user to choose which collection they wish to search with links that provide access to separate interfaces that are tailored to each type of material:

Leeds met Open Search splash page

Each tabbed interface provides an appropriate Advanced search form as well as relevant browse options; by LCSH or faculty for research and by HEA Subject Centres or JACS code for OERs:

Once again, massive thanks to Mike for his rapid response to the the myriad requests I make of him on a daily basis!

Advertisement

Leeds Met Repository Open Search Version 2.0

This is a bit of a trailer for our shiny new interface that I hope will go live in the next week or so and a run down of some of the new features.

It’s far from perfect and should still be seen as a beta – we very much need real users to start using it and I’m feeling a little nervous about how it will be received as I know how much work Mike, in particular, has put into it.

The interface has evolved from an SRU client developed for by IRISS – http://www.iriss.org.uk/learnx – which is available under GNU General Public Licence v.3 at http://code.google.com/p/sruopensearch/ (N.B.  We still intend to release our modified code under a similar licence.)  Learning Exchange Open Search is a great front end for searching intraLibrary but with just a simple search box lacked advanced search functionality that was essential for us.  We also wanted to use intraLibrary to manage resources for teaching & learning aswell as facilitating Open Access to our research collection in accordance with the EPrints model.

The tabbed interface incorporates an “Advanced search” form that allows users to cross reference multiple fields specifying AND/OR and they are also able to search for either “Research” or “Open Educational Resources” which uses authentication tokens to return results from the appropriate collections in intraLibrary:

advanced

There are also big changes in the way that results are returned; Mike has been able to use a unique identifier to build individual pages for each record so that a search will return a set of results that indicates whether or not each individual record has the full text available:

repository

These titles then link through to a static HTML page comprising all of the metadata associated with that record including a published URL and, where the full text is available, a link to the PDF in intraLibrary:

static

This static page should be indexed more effectively than was the case before though there is one small fly left in the ointment in that the public URL generated by intraLibrary that is used to download the full text is dynamic which means it cannot be indexed by Google; I’m not sure if it will be possible for Intrallect to do anything about this though they are aware of the need for full text indexing and are looking into the problem.

Separate HTML pages for individual records

I’m returning here to an old theme that is still nagging away at the back of my mind and that I think still needs exploring further as the functionality of the SRU interface develops; both by Mike and I and by Intrallect in the context of their ongoing development of the research repository aspect of intraLibrary.

Can we generate individual HTML pages for records such that a search query could generate a list of hyperlinks that point to those individual pages rather than to the location URL stored in intraLibrary which is currently the case?  This would more closely approximate the way that EPrints and DSpace work and potentially solve the Google problem by providing an easily indexable page of static HTML for search engine spiders to crawl.  Could these pages also have nice, short, human readable URLs instead of convoluted search strings / machine-generated public URLs from intraLibrary.  Again more like EPrints/DSpace.  Currently the only way I can give a link to an item is:

http://repository.leedsmet.ac.uk/main/search.php?q=promoting+open+access+to+research&x=22&y=26&exacttext=1

(The SRU search string that will provide the metadata)

Or

http://repository-intralibrary.leedsmet.ac.uk/IntraLibrary?command=open-preview&learning_object_key=i05n27905t

(The machine generated public URL for the actual PDF)

I’ve recently been adding RSS feeds to http://repos-dev.leedsmet.ac.uk/main/browse.php and another issue (aside from the fact that the wrong field is exposed by RSS) is that these also point to the location URL stored in intraLibrary – the PDF in the case of full text but the published URL in instances where there is a citation only.  It would be much better if these feeds could point at a Leeds Met repository metadata record.

I simply do not have the technical insight to know whether any of this is achievable at all and, if it is, how big a job it will be.

Spinning plates: A repository update

I feel at the moment that I’m trying to spin plates, rushing around like a guest act on the Paul Daniels magic show to impart sustained rotation – when I get to the last plate the first is beginning to wobble….and then someone throws me another, just to make it more exciting.

Oh and all the plates are different shapes and sizes (some of them I swear aren’t even symmetrical which gives them a lop-sided gyration that is really tricky to maintain…)

Currently research material collectively comprises one of the biggest and wonkiest sets of crockery:

  1. Peer reviewed journal articles
  2. Book items
  3. Conference proceedings
  4. Conference items (e.g. ppt presentations)
  5. Theses or dissertations
  6. Reports

All of these need to be displayed differently by the Open Search interface which means Mike is currently hacking away at the code to ensure they are picked up by Educational properties: Type of resource and then formatted appropriately.  In turn this is having a knock on effect for metadata entry and workflow and I’m still not sure how it will all tie together.

We are also relying on Mike to develop advanced search as part of the interface; work is progressing well and we are now able to cross reference (toggling AND/OR) Title; Subject; Publisher; Description (abstract); DOI; Type and Format  – the interface also has a free text box.  Mike is currently trying to implement probably the most important field – search by author – and has identified a potential problem due to limits on the way we can differentiate contributor roles in the metadata – currently we are only able to query Dublin Core rather than the full LOM record.  N.B. Recent testing has indicated that this may not actually be the problem we originally thought and it may be possible to query contributor role=author after all.  Then we’d also like to incorporate browse by author – though this is currently another plate yet to be balanced and spun….

Then there is the ongoing issue of differentiating content (i.e. research material vs. learning objects) and ensuring these are returned appropriately by an extended/alternative Open Search interface – this functionality is crucial to Unicycle of course to make OERs available.  We’re currently exploring using collection tokens which should allow us to submit a query incorporating an authenticationToken such that a given query only returns content from a particular collection.  Initial testing has gone well and we’re pretty confident we can implement this over the next week or so.

(NB.  I need to start thinking about license models for Unicycle and some flavour of Creative Commons will need uploading to intraLibrary, another one for the to do list…)

Ideally we could do with a functioning PowerLink for the VLE by September – as well as being crucial for the PC3 project, it represents functionality that will really get people engaged and allow us to demonstrate the benefits of storing and sharing teaching and learning materials in the repository rather than within the modular, inaccessible silos of Blackboard-Vista.  Also important for PC3 is implementing LDAP authentication (which really should be happening soon!) thereby giving teaching-staff – and the first PC3 cohort – access to intraLibrary – NB.  PC3 doesn’t require Open Access in the same way as Unicycle.

I’ve also been working with Rachel on developing a workflow for CLA material and ensuring that we can generate suitable reports for the CLA – during this process, we uncovered a bug in the metadata editor which slowed us down a bit but with help from Intrallect, we’ve managed to implement a work-around pending the bug being fixed in a future build and Rachel has started using intraLibrary to store and disseminate CLA material on a pilot basis.

The most recent plate JISC have thrown our way, of course, is the Bibliosight project which will almost certainly have an impact on the developing infrastructure beyond the specific deliverables of that project – our first meeting is on Monday 13th July for which there is a draft agenda on the project blog.

I just hope that we can keep all the plates spinning and don’t end up with a Greek wedding scenario!

Development of Research Repository Aspect of IntraLibrary

On Friday Mike and I visited colleagues at Keele University for a meeting with Charles Duncan from Intrallect to consider development priorities for intraLibrary to better serve our needs as a research repository.  Over 4 and a half hours we considered the basic issues that need addressing as well as looking forward to some more ambitious functionality and integration with the wider research infrastructure as we move towards the REF.

I was particularly interested to learn about how Keele are implementing Symplectic’s publications management system – http://www.symplectic.co.uk/ – which regularly trawls Web of Science and PubMed central for information about Keele’s academic publications.  Symplectic have clearly been thinking about integration with IRs and there’s even a link to SHERPA/RoMEO.  The system was used at Imperial College London for the RAE 2008 process and includes link functionality with DSpace which is that institution’s IR platform – http://spiral.imperial.ac.uk/.  Intrallect are currently liaising with Symplectic about integration with intralibrary – I’m not certain precisely what form this would take but in an ideal world it would be great if we could auto populate as much metadata as possible (title/bibliographic info/abstract/author/copyright status according to RoMEO) and automatically nudge academics for full text where appropriate!

At Leeds Met we currently lack any form of research database which is why I’ve been exploring what are essentially manual workflows to populate the repository with all research output – I’m not sure how expensive Symplectic is and it may be difficult to justify given this institution’s relatively small research output and the repository may well have to be the research database which is the assumption I’ve been working on; we will also want to explore the soon-to-be-released Web of Science API which may, in any case, enable us to emulate some of this functionality ourselves.

The first item on our agenda was somewhat more prosaic and focussed on our immediate functional requirements – SRU searching and metadata.  Mike has been working on incorporating advanced search into the SRU interface and come up against a couple of issues when searching by author and date which are essentially artifacts of having to query DC rather than LOM; in the LOM, creators and contributors are clearly differentiated, however, querying by DC conflates creator and author roles which may (will) be different if resources are uploaded by someone other than the author.

  • Searching dc.creator will search for the creator and author roles
  • Searching dc.contributor will search for the content provider role

In addition:

  • Searching by dc.date only searches data that relates to the intraLibrary submission process (i.e. the deposit date, and perhaps modification dates if you added an author later on for example)
  • The only way to search journal dates is to use the default free text search that searches everything (or most fields anyway).

The solution, of course, is to make it possible to query the LOM by SRU and this is now Intrallect’s intention – indeed, to render all LOM fields query-able which would include user generated tags for example.

The next big question is exposure of open content to search engines and Charles gave us an overview of plans to develop an object “home page” with a static URL which should help in this area.  We also discussed sitemaps and what need to be done external to intraLibrary.  I’m still unclear on how we can improve the format of results returned by Google from the SRU interface; to repeat, Google IS indexing http://repository.leedsmet.ac.uk/ with site: http://repository.leedsmet.ac.uk/ currently returning over 500 records.  However this is fairly unstructured; Google is simply following links from http://repository.leedsmet.ac.uk/main/browse.php; any subsequent links Googlebot encounters are also indexed and returned as “The Repository search for [link name]” and ideally I’d like results to be returned in a more structured and user friendly form.   Many queries actually return no results where there is (yet) no content to find though where there is content, Google is indexing all human readable metadata.  I’m also not certain whether Googlebot is finding its way into the full text via the Open URL/virtual file paths generated by intraLibrary.  Full text indexing within intraLibrary itself has also been promised.

In short, I’m really not sure how all of these factors may combine to be exploited by a next generation SRU interface!

We then touched upon self-archiving and (semi) mediated workflows; potentially developing SWORD based quick deposit from desktop/web, ideally with automatic metadata generation.

The two other major issues we considered are:

  • Policy metadata – handling embargoes

This is pretty crucial to an OA archive of research as many publishers of academic journals specify an embargo period of 12 or 18 months from the date of publication before a paper can be made available in a repository.  We need to be able to add a paper to intraLibrary upon receipt but restrict access until the embargo has expired and for this to happen automatically.  On one level, this functionality should be fairly straightforward to achieve by having intraLibrary check today’s date against an embargo date specified in the metadata; it’s a little more complicated than that though as we would want the metadata to be visible before the embargo date, just not the full text.

  • Cover pages for PDF

It was suggested that a coversheet should be generated by intraLibrary on the fly which would certainly be useful as manually creating cover sheets for each and every article is time consuming to say the least; this would be useful functionality for CLA materials which also require a coversheet.

These developments will take some time to implement and the next stage is to prioritise – by anonymous e-postal ballot – Intrallect hope we will start to see some of the major initiatives in a build towards the end of the year.

Thank you to our colleagues at Keele for making us welcome and for feeding us!

Repositories for research and teaching/learning material: The debate continues at #rpmeet

reprog

Last week I attended the JISC Repository and Preservation end of programme meeting in Birmingham. I recall being very nervous at my first JISC event in November 2007 but feel much more at ease now and enjoyed the event immensely; the programme has certainly been successful in fostering a sense of community though it’s an unusual social experience to meet people face to face, often for the very first time, when one feels you already know them from reading their blog and following them on Twitter.

During one of the breakout sessions on the first day I made a bee-line for a discussion about repositories for learning and teaching materials – as opposed to OA research repositories. I use the word “opposed” advisedly as there is certainly some strong sentiment around the issue, particularly with respect to using a common software platform. As a representative of a project that is adapting a learning object repository to also serve as an effective Open Access research repository I’m finding it a little difficult to understand the vehemence of some of this opposition, though I would be the first to acknowledge a steep learning curve and recognise that we have required extensive development, not of intraLibrary itself perhaps, but of an appropriate web infrastructure surrounding it. And yes, we would certainly have been able to implement a functioning OA research repository more quickly using EPrints or DSpace however, from the outset, it was vital that our repository had the the capacity to fulfil its broader potential – in the words of Clifford Lynch “[A] mature and fully realised institutional repository will contain the intellectual works of faculty and students – both research and teaching materials – and also documentation of the activities of the institution itself in the form of records of events and performance and of the ongoing intellectual life of the institution.”  [Lynch, Clifford. A “Institutional Repositories: Essential Infrastructure for Scholarship in the Digital AgeARL Bimonthly Report 226 (2003).]

It’s also important to be pragmatic.  Historically, Leeds Metropolitan University is a polytechnic that gained chartered university status in 1992; its heritage is very much in teaching and learning rather than research with, arguably, a more vocational than academic flavour.  In recent years, the research profile has steadily increased, culminating in unprecedented success in the 2008 RAE and the university is naturally keen to capitalise on this success, enhance its research profile further whilst also continuing to emphasise its student focussed teaching and learning credentials. The implementation of an integrated repository to support both research outputs and learning objects reflects this dual focus.  Clifford Lynch’s article suggests that the concept of a central system to manage disparate resources in this way has been implicit within the sector for some years, however, the technology has tended to focus on Open Access to research, with the two most widely used software platforms being EPrints, developed at the University of Southampton in 2000, and DSpace, developed at MIT in 2002; early versions of both platforms were primarily designed to manage text based resources (though subsequent versions of EPrints and DSpace can manage a wide range of digital file formats.)  

NB.  In an extended discussion on this issue on JISC-REPOSITORIES (archive hereRepositoryMan Les Carr of EPrints refers to the fact that he still comes across the firmly held (and spurious) belief that because EPrints is used for Open Access it can’t be used for multimedia files or scientific data.

The session was chaired by Amber Thomas of JISC and I asked a somewhat blunt, perhaps naive, question about JISC’s perspectives on combined repositories of research and teaching materials.  Amber suggested that JISC have been deliberately neutral on the issue which is also perhaps emphasised by the diagramatic representation of the programme structure reproduced above.  

Some of the commentators last Wednesday were adamant that though it may well be possible to manage different types of resources with a single system it was far from desirable with one colleague making the pithy analogous observation that you can write letters in Excel but that doesn’t make it right.  Phil Barker of CETIS was also at the discussion and in a recent blog post on the “question of whether research outputs and learning materials should stored in the same repository” is “inclined to think the answer is no, the purpose of the repository is different, a learning material isn’t an output, sharing means something different for the two resource types.”  Phil goes on to say that ” If you think a repository is a database and a bit-store then you may come to a different conclusion, but I think a repository is a service offered to people and your choice of starting point in offering that service will affect how easy your journey is.”  (Full post here)

I’d certainly concede that our journey hasn’t been an easy one and I also agree that a repository is a service offered to people and with our repository start-up, and also Streamline and PERSoNA, that is certainly the approach we have tried to take; with intraLibrary and the SRU interface we now have an incipient infrastructure to manage both research material and learning objects; the discrete types of material can be managed entirely separately, however, there is also potential for the ongoing development of a holistic approach to the management of the full range of digital resources produced by a modern university and as we develop our infrastructure further I hope we can utilise appropriate web-technology around a central management system (intraLibrary) to achieve decentralised resource discovery – through appropriate interfaces, widgets and environments – the VLE for example.

JISc-meeting09-poster

Then of course there is the small matter of persuading academics to part with their resources, not to mention IPR, copyright and quality control issues…

Open Access to research is an evolving paradigm and represents a considerable shift in the established academic publishing process; Open Access to a broader range of educational resources still more so. Any paradigm shift is likely to take time to evolve and Open Access, to research and other materials, is no exception, especially given that academia, perhaps, tends to subscribe rather strongly to established tradition!

JISC’s current OER programme should go some way to addressing many of these issues but infrastructure is the foundation. The perfect system almost certainly doesn’t exist and it’s surely important to be pragmatic when implementing and developing appropriate system. Here’s to ongoing discussion, debate and development.

Archive and Special Collections

A colleague has recently set up a blog for Archive and Special Collections at Leeds Met:

http://archivepost.wordpress.com/

One particularly interesting area, from my point of view, is work on developing machine-readable lists of the various collections/cataloguing and finding aids.   Keith and I have already discussed how we might use the repository and I intend to explore further just as soon as I have time!  Then perhaps we can also think about digitising material and making it available to view on line.

intraLibrary is just for Learning Objects, isn’t it?

The issues around adapting intraLibrary to adequately function as an Open Access repository of research are agonisingly documented on this blog (see https://repositorynews.wordpress.com/category/adapting-intralibrary/); there was an interesting and necessary discussion on the JISC-REPOSITORIES mailing list yesterday (though it’s still rumbling on) about the differences, if any, between ‘general repositories’ like EPrints and DSpace and specialised Learning Object repositories and the suitability of various platforms to fulfil a variety of institutional needs (Open Access to research material; Reusable Learning Objects/Content Packaging; other multimedia and complex digital objects. )

The hardest line was that it would be highly impractical to use the likes of EPrints and DSpace to “store, catalogue and serve e-learning resources” or, conversely, to use a specialised LO repository like Intralibrary for research.  My own view is that there is scope for complementary technology and that LO repositories can benefit from the culture of openness and sharing exemplified by OA archives of research as the zeitgeist shifts towards Open Access to a wider range of educational resources.

I would be the first to recognise that intraLibrary isn’t ideally suited to be used as an Open Access repository, however, with some “customisation” it can do the job perfectly well.  I expect the same is also true, from the other direction perhaps, of DSpace and EPrints – Soton, in fact, is currently  developing EdSpace based on its famous open source software.

Institutions increasingly expect their repositories to manage a wide range of digital material; at a recent RSP focus group it was clear that repository administrators running a range of platforms are increasingly being expected to manage everything and the kitchen sink.  Moreover, institutions, especially smaller ones, simply don’t have the resources to implement the ideal software solution(s) that will satisy multiple stakeholders.

(Disclaimer:  Some of these perspectives are paraphrased from my colleagues on JISC-REPOSITORIES)

For Les Carr’s perspective on the discussion* see his blog post on repositoryman.

* or argument!