July 16, 2008 2 Comments
intraLibrary is designed as a learning object repository and it is only now becoming clear just what is involved so that the platform will also function as an Open Access repository of research.
Access to learning objects is generally federated. For example, in order to access resources in JORUM it is neccessary to authenticate via Athens (soon to be Shibboleth) or by a UK Access Management Federation log-in mechanism and, so far as I know, it is not possible to search the repository externally via a search engine. As the very point of an Open Access repository is to make research discoverable and accessible on the public internet this is obviously not desirable! It is, I think, relatively straightforward to expose metadata out to search engines via the OAI-PMH but the majority of search engines no longer support the protocol and we really need to allow the full text to be crawled by Googlebot and other search engine spiders which, I suspect, will not be able to get past the authentication gateway (need more info on this). Moreover, if an external user does come to the repository via Google it will not be possible for them to search content without first authenticating into the system – not very open. Notwithstanding the fact that about 80% of traffic comes to a repository via search engines (assuming they can index content in the first place) we obviously also want an accessible search interface aswell.
The potential solution to these problems that I am currently investigating is to use a seperate, web-based SRU interface which sits outside the repository and is accessible on the public internet.
As part of the CD-LOR project Intrallect have already developed a basic SRU interface which, in turn, has been substantially improved by a third party – IRISS interface here – who have made the code available under an open source licence. The IRISS interface is still fairly basic and does not incorporate all of the functionality that we require – it is essentially a search box only and, for example, would not facilitate browsing the research collection by faculty. It should be reasonably straightforward to customise the interface to incorporate the functionality that we require; we essentially need a series of hyperlinks that map onto the internal repository structure and that will return the appropriate queries. I also need to clarify if such an approach will enable Googlebot and other search engine spiders to crawl the full text thus making the content searchable on the open web.
For each object, intraLibrary generates a public URL which can be linked to directly – on the open web and with no need for authentication. However, a further issue is that, due to the way that intraLibrary works, a query return (either from a search engine or the SRU interface) will link directly to the resource itself – i.e. a PDF of a research article will open immediately in the browser window. When facilitating Open Access to research this is undesirable for several reasons and we require some sort of “landing screen” that can provide context and basic information (abstract, copyright info, whether the paper has been refereed); indeed, there will often be a legal requirement to provide copyright information with many publishers also stipulating that there must also be a link to the published version of the paper. Precisely how we will resolve this issue is yet to be determined; it might be possible to embed a link to the PDF into some sort of HTML template and have this template returned at the public URL?
Watch this space…
By working closely with Intrallect and with a little ingenuity I am confident that these issues will be resolved and that we have, in intraLibrary, an excellent solution to our diverse needs.