April 18, 2011 Leave a comment
This might be messy – just need to brain dump to try and figure this out. Thanks again to the Godfather of Mashed Libraries @ostephens for his help!
intraLibrary RSS feeds point to resource “in the wild” rather than the record on Open Search. Like this:
Owen helped me put together a simple pipe that took this feed and used regex to replace an identifier from the record and redirect to the Open Search URL (which is built from this identifier. Like this:
So far so good…however, when I originally defined my Application Profile for research in intraLibrary I used multiple instances of <lom:description> with the first holdong ISSN (frustratingly missing from intraLibraries Bib extensions) and the Abstract held in a second instance of <lom:description> meaning it isn’t exposed in an RSS feed.
So…I thought that if I used an SRU query instead as a pipe input there would be a lot more data to play with in Pipes and hopefully I would be able to build a better RSS feed – including author and abstract.
After some initial problems with Pipes taking an SRU input, Owen responded to my plea for help with this pipe that extracts title and abstract from the SRU by defining the path through the XML to the relevant fields and mapping them to title and description:
However, I still need to figure out how to link the title to the respective record on Open Search. There is a link in the XML but this is no good as, once again, it points to the resource in the wild rather than the record on Open Search…somehow I need to use the identifier to build a link to the respective record on Open Search.
And frankly now I’m a bit stumped again – the regex function from the first pipe presumably needs to be in there somewhere…first vague attempt (doesn’t actually return any output – but this is a brain dump!):
In theory RSS is simple, Really Simple, but the way that the technology is implemented by our underlying repository software (intraLibrary) and issues around how we have needed to integrate that software within our repository infrastructure in order to ensure appropriate Open Access has meant that, in reality, it has been anything but.
Broadly speaking, the issues are two fold:
- The fields exposed by the intralibrary RSS feed are limited to “Title” and “Description”.
- The URL exposed by the feed points to the public URL generated by intraLibrary (which would simply be the resource itself, either a file or URL i.e. without the context of the metadata record) whereas I need it to point to the Open Search metadata page.
I have been aware of these issues for some time but finding a resolution has been elevated in priority recently due to two separate, though similar, use-cases being explored by JorumOpen and the Xpert project at Nottingham University, that effectively seek to extend RSS from simply being a notification system to potentially also being used to harvest repository content (some are of the view that this not an appropriate use of the technology and that there are established technologies more suitable, specifically OAI-PMH and SWORD – see Intrallects’s Charles Duncan’s contribution to the discussion on Lorna Cambell’s blog). The full discussion on Lorna’s post is certainly worth reading and also includes several contributions from Xpert’s Julian Tenney and Pat Lockley.
By now I have had an extended correspondence with Pat, initially prompted when I submitted the generic intraLibrary feed to be harvested by Xpert. Preliminary feedback was that when Xpert tried to harvest our feed, it appeared that a randomly generated “key” was added to the URL, meaning they were seeing these urls as new resources, whereas they were actually duplicates (this was a consequence of the manner in which intraLibrary generates a publicURL each time a record is returned with a new machine generated “key” each time.)
As I described in a recent post, a little bit of Twitter serendipity and specifically input from Owen Stevens, subsequently led me to use Yahoo Pipes to redirect to the Open Search metadata page instead of the publicURL; Yahoo pipes also allows a pipe to be rendered as RSS and it occurred to me that this new feed could be resubmitted to Xpert for harvest. Sure enough Pat confirmed that this new feed rendered from the pipe could indeed be harvested and that each URL was definitely unique (generated, of course, from resource unique IDs by http://repository.leedsmet.ac.uk/main/index.php – see https://repositorynews.wordpress.com/2009/11/09/leeds-met-repository-open-search-version-2-0/ for more info). However, Pat also emphasised that it would be nice to find a generic way of harvesting without the pipe but for the time being I’m not sure we are in a position to implement such a solution; the work that we have done at Leeds Met on the Open Search interface and my personal obsession with redirecting RSS feeds is quite distinct from Intrallect’s primary commercial interest and so unlikely to be *officially* supported by the company (Note: As always Intrallect have been very supportive to me and certainly have proactively supported the development of our infrastructure throughout, it’s just that this isn’t a priority for them in the same way.)
As Owen has pointed out, all the Pipe does is take the ID which is part of the item’s GUID in the original RSS, and constructs a link to the metadata page which is put back into the RSS feed. You could obviously do this programatically (which might be a solution to Pat’s requirement of harvesting without the Pipe?) – it was just easier to throw together something quickly using Pipes and we simply don’t have the resources to explore another (programmatic) solution in detail.
Owen has also suggested that, as intraLibrary supports OAI-PMH, this would be the ‘supported’ mechanism for harvesting which really brings us full circle back to Charles’ argument referred to earlier in this post.
Note: On Lorna’s blog, Julian argues back at Charles that “OAI-PMH/SWORD etc are big technical barriers for many people who have resources to expose and that anyone can make a feed”; though in a subsequent comment, Pat does acknowledge that “it might be logical that if we are making a second RSS for harvesting we might use some other technology instead.”
All of which doesn’t get me very much further with my other RSS issue which is the limited metadata with just <title> and <description> exposed by RSS and it was this that seemed to be more of an issue for harvest by JorumOpen with feedback from Gareth Waller confirming that the feed from intraLibrary “does not represent the metadata for the individual items (or the top level feed) in DC format (except of course date)” and in order for our feed to be processed with the code Gareth has implemented in JorumOpen , the feed would “need to contain DC metadata for each of the items and, more importantly, licence information”. The feed from the Pipe, of course, still comprises just <title> and <description> and so would not be suitable for registering our OER content in JorumOpen. (N.B. The limited metadata, of course, will surely also affect the quality of metadata harvested and searchable via Xpert?)
What would really make my life easier I think is for JorumOpen and Xpert to harvest my metadata using OAI-PMH (go on , you know you want to!) but for the time being I am just happy to have found a method to generate RSS feed that point to my static URLs at http://repository.leedsmet.ac.uk/main/index.php!
Using Yahoo Pipes to redirect to Open Search metadata page instead of intraLibrary public URL (and the power of Twitter)
First of all, a big thank you to Owen Stephens – @ostephens – who responded to my musing tweets on RSS by assembling a Pipe that “Rewrites Intralibrary RSS feed to use ‘link’ to metadata rather than object”; a great example of the power of Twitter for anyone who still thinks it’s an exercise in pointless self-revelation, full of trivial noise. As Amber Thomas -@ambrouk – put it recently and as I also to tend towards, Owen exemplifies “whole person Tweeting” not restricting our interraction on Twitter to our professional sphere but filling it with more personal and sociable “noise” – the closest thing to a virtual office you will find. I’ve never met Owen in real life but I shall certainly buy him a pint if our paths ever do cross!
As anyone who has passed by these parts before will know, we have been wrestling with intraLibrary for about two years now to develop a blended repository of Leeds Met’s research output (both Open Access full text and citation only) and Teaching & Learning material (both Open Educational Resources/material for federated access only) and we have spent a lot of time developing the IRISS SRU interface as a front end to provide appropriately differentiated Open Access to the different types of resources.
One of the simplest ways for a repository to alert users to new content is via RSS and it is very easy to generate a feed for pretty much any criteria in intraLibrary; I have generated several feeds for both research collections (by faculty) and for OER. There are, however, two main issues with these feeds:
- The first problem is associated with the metadata template I have implemented for research and the lack of flexibility to customise which fields are exposed via RSS – I haven’t yet got a solution to this issue.
- the second problem, however, arises because the URL exposed by the feed points to the public URL generated by intraLibrary whereas I need it to point to the Open Search metadata page and this is where Yahoo Pipes can come in.
I don’t have much experience with Pipes but it is billed by Yahoo as “a free online service that lets you remix popular feed types and create data mashups using a visual editor. You can use Pipes to run your own web projects, or publish and share your own web services without ever having to write a line of code.”
Owen’s pipe has three components:
- “Fetch feed” which is simply the intraLibrary generated RSS feed
- “Regex” which applies a regular expression to an item attribute. In this case it takes the components of the public URL in item.guid.content - oai:com\.intralibrary\.leedsmet:(.*) – and replaces it with the components to build the URL of the Open Search metadata page – http://repository.leedsmet.ac.uk/main/view_record.php?identifier=$1&SearchGroup=Open+Educational+Resources
- “Rename” which does what it says on the tin and simply renames or copies item atributes – in this case item.link becomes objectlink and item.guid.content becomes link
The Pipe Output can be subscribed to as RSS which gives us a feed that does indeed link to the Open Search metadata page rather than the intraLibrary public URL:
Simple when you know how!
N.B. Some of these links actually DON’T work and I haven’t yet been able to figure out quite why. As far as I can tell the affected resources were all uploaded as part of the Reproduce project and the links are to none-existent unique identifiers e.g. http://repository.leedsmet.ac.uk/main/view_record.php?identifier=1432&SearchGroup=Open+Educational+Resources . I think it may be because some of these records seem to have been ascribed 2 unique identifiers – this is an automatic process in intraLibrary and configured as uneditable so I’m not sure how it has happened. However, they were originally uploaded by another user before the user profiles and metadata template for ukoer were fully configured and I may need to delete and re-upload as I have done already with http://repository.leedsmet.ac.uk/main/view_record.php?identifier=1673&SearchGroup=Open+Educational+Resources – I’m not sure how/when this will be updated in the feed and currently it’s still linking to the none-existent UID; it may take a little time to update.
Anyway, I’ve copied @ostephens Pipe – hope you don’t mind Owen, I couldn’t find any rights information(!) – and replaced the feed with one of my research feeds – Carnegie Faculty of Sport and Education – and modified the “Regex” module to redirect appropriately by replacing oai:com\.intralibrary\.leedsmet:(.*) with http://repository.leedsmet.ac.uk/main/view_record.php?identifier=$1&&SearchGroup=research.
N.B. As mentioned, there is an additional problem associated with the research metadata template and the lack of flexibility to customise which fields are exposed via RSS – the only way I have been able to accommodate all of the information required for research (using EPrints software as a template) with the intraLibrary metadata schema (UK LOM) is by using multiple description fields as extensions; ideally I would like the abstract to be exposed via RSS but this is in second description field whereas it is the bottommost description field that is exposed via RSS which generally contains whether the resource is refereed or not – this is not terribly relevant so I’ve added a Mapping to the “Rename” module to remove it which means the feed now exposes just the title which does indeed link to the Open Search metadata page:
(Disclaimer: I am not a qualified librarian or cataloguer.)
Thus far I’ve only really considered, in detail, the minutiae of research specific metadata (a consequence of the ongoing project to repurpose an LO repository as an OA archive of research). However, this morning I uploaded my first learning object to intraLibrary that has been specifically designated as an OER for the Unicycle project. The resource itself isn’t terribly exciting in that it is just a Word.doc but it makes sense to start with a relatively simple object than a more exotic file type or package.
The metadata template I’m using is taken directly from JORUM without any modification and is consequently (currently at least) based on IEEE LOM though it’s worth restating that we’ve recently learned that JORUM will be using DSpace to serve OER rather than intraLibrary so that service’s template is almost certain to change considerably in the very near future.
Payment Required (yes/no)
Subject to Copyright (yes/no)
Statement of Copyright and Restriction
IMS LRM Metadata Identifier
Role of Metadata Contributor
Date of Metadata Contribution
Language of Metadata Record
Location of Resource
Language of Intended User
Type of Resource
Intended for Use In
Level of Difficulty
Description of Contribution Date
IMS LRM Metadata Identifier
Description of Metadata Contribution Date
Format of Metadata Record
Size Resource in Bytes
Type of Delivery System
Name of Delivery System
Minimum Version of Delivery System
Maximum Version of Delivery System
How to Install This Resource
Special Requirements for Use
Duration of Media Resource: Playing Time
Type of Interactivity
Educational Purpose Description
Language of Intended User
Level of Interactivity
Intended for Use By
CLA Reporting Data
In addition, the resource is classified against JACS (Joint Academic Coding System) – again emulating the current (intraLibrary) JORUM configuration and can be ascribed one of the six Creative Commons attribution licences.
So off I went.
Title is straightforward, and Description – though this field does require some thought to make it useful – but what about Keyword(s)? My first thought is that we need some sort of controlled vocabulary – for research material I am using Library of Congress Authorities which is well established practice for this type of material but perhaps less appropriate for learning objects; one could rely on author produced keywords but these are perhaps not easily generated as formal metadata and will tend to lack consistency (social tagging is perhaps a slightly different issue and intraLibrary does have a facility for ascribing user-generated tags after the resource has been published.)
I posted my idle musings to Twitter and was immediately called to account by @philbarker and @LornaMCampbell who both asked what specific aspects of the learning object the controlled vocab would need to describe. @ambrouk acknowledged it was a big question and suggested two separate criteria: a) subject classification b) everything else author might tell you about / user says they want?
Input also came from @lynncorrigan and @KavuBob who pointed me to the HILT (High Level Thesaurus) project at Strathclyde – http://hilt.cdlr.strath.ac.uk/hilt4/demonstrators.html
I haven’t yet looked closely at this project but from the Twitter response it’s clearly an area that needs further research so this is a quick post to further the discussion…I’ll come back when I’ve explored the issue in more detail and all perspectives in the meantime gratefully received.
Some final unstructured thoughts at this stage:
- The need for a lightweight metadata template – which I guess the mandatory subset is though even this requires Keyword(s) AND classification against JACS…AND social tags can also be applied. To what extent the recommended/optional fields need to be completed depends to some extent on the type of resource though I suspect that some of these fields will actually need to be mandatory for certain resource types – Duration of Media Resource: Playing time for example
- How the metadata relates to resource discovery on the modern web – metadata is ultimately to aid resource discovery (and nail down copyright and licencing). Where will our OERs be primarily discovered from? A Leeds Met interface/a JORUM interface/Google/All of the above/somewhere else?
- Workflows – who uploads resources and applies metadata? The creator/repository administrator? Does there perhaps need to be a multiple stage workflow utilising something like the recently released JORUM quick deposit tool (ideally based on SWORD) whereby metadata is subsequently enriched by trained staff?
Twitter hash-tag: #oerstartup
(See http://cloudworks.ac.uk/node/1725 for aggregation)
- April 2013
- March 2013
- January 2013
- October 2012
- September 2012
- July 2012
- June 2012
- May 2012
- April 2012
- March 2012
- September 2011
- August 2011
- July 2011
- June 2011
- April 2011
- March 2011
- January 2011
- December 2010
- November 2010
- October 2010
- September 2010
- August 2010
- July 2010
- June 2010
- May 2010
- April 2010
- March 2010
- February 2010
- January 2010
- December 2009
- November 2009
- September 2009
- August 2009
- July 2009
- June 2009
- May 2009
- April 2009
- March 2009
- February 2009
- January 2009
- December 2008
- November 2008
- October 2008
- September 2008
- August 2008
- July 2008
- June 2008
- May 2008
- April 2008
- February 2008