Really (not so) Simple Syndication
January 7, 2010 5 Comments
In theory RSS is simple, Really Simple, but the way that the technology is implemented by our underlying repository software (intraLibrary) and issues around how we have needed to integrate that software within our repository infrastructure in order to ensure appropriate Open Access has meant that, in reality, it has been anything but.
Broadly speaking, the issues are two fold:
- The fields exposed by the intralibrary RSS feed are limited to “Title” and “Description”.
- The URL exposed by the feed points to the public URL generated by intraLibrary (which would simply be the resource itself, either a file or URL i.e. without the context of the metadata record) whereas I need it to point to the Open Search metadata page.
I have been aware of these issues for some time but finding a resolution has been elevated in priority recently due to two separate, though similar, use-cases being explored by JorumOpen and the Xpert project at Nottingham University, that effectively seek to extend RSS from simply being a notification system to potentially also being used to harvest repository content (some are of the view that this not an appropriate use of the technology and that there are established technologies more suitable, specifically OAI-PMH and SWORD – see Intrallects’s Charles Duncan’s contribution to the discussion on Lorna Cambell’s blog). The full discussion on Lorna’s post is certainly worth reading and also includes several contributions from Xpert’s Julian Tenney and Pat Lockley.
By now I have had an extended correspondence with Pat, initially prompted when I submitted the generic intraLibrary feed to be harvested by Xpert. Preliminary feedback was that when Xpert tried to harvest our feed, it appeared that a randomly generated “key” was added to the URL, meaning they were seeing these urls as new resources, whereas they were actually duplicates (this was a consequence of the manner in which intraLibrary generates a publicURL each time a record is returned with a new machine generated “key” each time.)
As I described in a recent post, a little bit of Twitter serendipity and specifically input from Owen Stevens, subsequently led me to use Yahoo Pipes to redirect to the Open Search metadata page instead of the publicURL; Yahoo pipes also allows a pipe to be rendered as RSS and it occurred to me that this new feed could be resubmitted to Xpert for harvest. Sure enough Pat confirmed that this new feed rendered from the pipe could indeed be harvested and that each URL was definitely unique (generated, of course, from resource unique IDs by http://repository.leedsmet.ac.uk/main/index.php – see http://repositorynews.wordpress.com/2009/11/09/leeds-met-repository-open-search-version-2-0/ for more info). However, Pat also emphasised that it would be nice to find a generic way of harvesting without the pipe but for the time being I’m not sure we are in a position to implement such a solution; the work that we have done at Leeds Met on the Open Search interface and my personal obsession with redirecting RSS feeds is quite distinct from Intrallect’s primary commercial interest and so unlikely to be *officially* supported by the company (Note: As always Intrallect have been very supportive to me and certainly have proactively supported the development of our infrastructure throughout, it’s just that this isn’t a priority for them in the same way.)
As Owen has pointed out, all the Pipe does is take the ID which is part of the item’s GUID in the original RSS, and constructs a link to the metadata page which is put back into the RSS feed. You could obviously do this programatically (which might be a solution to Pat’s requirement of harvesting without the Pipe?) – it was just easier to throw together something quickly using Pipes and we simply don’t have the resources to explore another (programmatic) solution in detail.
Owen has also suggested that, as intraLibrary supports OAI-PMH, this would be the ‘supported’ mechanism for harvesting which really brings us full circle back to Charles’ argument referred to earlier in this post.
Note: On Lorna’s blog, Julian argues back at Charles that “OAI-PMH/SWORD etc are big technical barriers for many people who have resources to expose and that anyone can make a feed”; though in a subsequent comment, Pat does acknowledge that “it might be logical that if we are making a second RSS for harvesting we might use some other technology instead.”
All of which doesn’t get me very much further with my other RSS issue which is the limited metadata with just <title> and <description> exposed by RSS and it was this that seemed to be more of an issue for harvest by JorumOpen with feedback from Gareth Waller confirming that the feed from intraLibrary “does not represent the metadata for the individual items (or the top level feed) in DC format (except of course date)” and in order for our feed to be processed with the code Gareth has implemented in JorumOpen , the feed would “need to contain DC metadata for each of the items and, more importantly, licence information”. The feed from the Pipe, of course, still comprises just <title> and <description> and so would not be suitable for registering our OER content in JorumOpen. (N.B. The limited metadata, of course, will surely also affect the quality of metadata harvested and searchable via Xpert?)
What would really make my life easier I think is for JorumOpen and Xpert to harvest my metadata using OAI-PMH (go on , you know you want to!) but for the time being I am just happy to have found a method to generate RSS feed that point to my static URLs at http://repository.leedsmet.ac.uk/main/index.php!