OER project: Unicycle

As mentioned in a previous post, colleagues at Leeds Met have recently been successful in the recent JISC call for the Open Educational Resources programme.  Simon Thomson, the project manager for Unicycle, has given me an overview of the project and how the repository will contribute to project deliverables.  In essence we need to make 360 credits worth of content locally and publically available – both via our own repository and JORUM Open.  This will equate to approximately 3600 hours of material and Simon already has some ideas of where this will come from – CETL workshops for example – Unicycle will explicitely repurpose / share existing material; it will not create new material.

Simon hopes to assemble an “editorial” team comprising  an academic representative and a learning technologist from each of the 6 Leeds Met faculties; Simon and I will also be members of this team that will convene every month to assess / quality check potential content.  My job will be to ensure material is in an suitable format for ingest and appropriately tagged with metadata; to get stuff IN, ensure that it is discoverable and can be got back OUT!   In the first instance I anticipate cataloguing resources against the JACS system and using the JORUM metadata template already in place; this would seem sensible in view of the fact that the same resources will also be stored in JORUM Open and it will certainly be desirable to liaise with that service throughout the project.

N.B.  Rather than dual deposit in this way, might JORUM explore harvesting open content from our repository / other repositories of Open Educational Resources?

Another crucial area, of course, is the licencing issue; both Simon and I anticipate using some flavour of Creative Commons and again this is an area that will benefit from liaison with JORUM – especially in view of their evolving 3 licence model.

On a more technical note I will also be very interested to see how JORUM will be facilitating open search functionality.  Currently there are a series of RSS subject feeds at http://www.jorum.ac.uk/support/rssfeeds.html#subjectfeeds but these still need authentication to access the resources; presumably they will need to implement some sort of portal based on OAI-PMH or SRU – might they also look at searching other repositories (like ours!) using OAI-PMH for example?

Lack of incentive for sharing is recognised as a problem in the context of reusable learning objects and another crucial element of the project will be to identify / implement reward and recognition policies though cultural change with respect to OERs will no doubt be a long term process both institutionally and within HE as a whole.

Advertisement

Google indexing and SEO

It is crucial that both the Open Access full text research content of the repository and metadata records of citation material are fully indexed by Google (and other search engines); in the future it is also likely to be required for other Open Educational Resources (learning objects). However, site:http://repository-intralibrary.leedsmet.ac.uk/ currently returns just 4 results (in addition to the Login page itself) and it is a bit of a mystery how these 4 are actually being picked up when the majority of records are not.

In intraLibrary, for a given collection, the administrator may choose to:

• Allow published content in this collection to be searched by external systems

This effectively means SRU (Search and Retrieve by URL) a standard search protocol utilizing CQL (Common Query Language).

• Allow published records in this collection to be harvested by external systems

This effectively means harvest by OAI-PMH

XML Sitemaps

Intrallect have suggested that it is necessary to implement an XML sitemap to ensure that content is properly crawled by Google. Until 2008, Google did support sitemaps using OAI-PMH but have since withdrawn this and now support only the standard XML format. Intrallect have therefore developed a software tool that converts OAI-PMH output to an appropriate XML format. A sitemap has been generated and registered using Google’s webmaster tools but currently is registering a series of errors that indicate “This URL is not allowed for a Sitemap at this location”; 9 errors are listed from the very first URL and which are sequential; it seems that the crawl does not go any further and none of the 100+ URLs in the sitemap have been successfully recognised. Two possible reasons have been suggested for this:

• All of the URLs in the sitemap are external; it may be that Google does not permit URLs outside the mapped domain.
• There is a problem with the XML itself

Sitemap here: http://repository-intralibrary.leedsmet.ac.uk/sitemap/Sitemap.xml

Sitemaps using RSS

It is also possible to submit a sitemap based on RSS, however, this approach has not been any more successful as the Open URL/virtual file paths generated by intraLibrary are inaccessible to Google resulting in the following warning:

URLs not followed
When we tested a sample of URLs from your Sitemap, we found that some URLs redirect to other locations. We recommend that your Sitemap contain URLs that point to the final destination (the redirect target) instead of redirecting to another URL.

Google and SRU

Though SRU does not facilitate indexing by Google per se, the integration of the SRU Open Search interface may provide a potential solution. site:http://repository.leedsmet.ac.uk/ currently returns 247 records; largely these appear to represent Googlebot following the various browse links (many of which themselves return no results where there is no content to find!) In addition, Googlebot appears to be following hyperlinked author names, publisher and subject(s) in the individual metadata records:

google

The third of these “The Repository search for Morton, Veronica” links to the two metadata records associated with that name as though it had simply been entered into http://repository.leedsmet.ac.uk/ as a search term:

http://repository.leedsmet.ac.uk/main/search.php?q=Morton%2C+Veronica+

Presumably these records were initially indexed via the appropriate links on the browse interface – http://repository.leedsmet.ac.uk/main/browse.phpFaculty of Health and R – Medicine and then re-indexed via the hyperlinks embedded in the metadata records. It is interesting to note that, though Morton, Veronica only has two records associated with her name, this record appears relatively high – at the top of the second page – and this is probably because there so many other authors also associated with these papers; all of these names are hyperlinked giving over 21 separate indexable links.

It seems that we might need to formalise the structure of the SRU to ensure it is optimised for Google; possibly with some sort of SRU sitemap. For example, if we could generate a page that linked to all the individual metadata records in the repository and optimise this page to be crawled by search engine spiders (doesn’t need to be human readable; could be XML) which could then follow the links to the associated metadata.

It also seems to me that Search Engine Optimisation will need to comprise appropriate customisation of the SRU interface; for example, we want to facilitate browse by author which, in turn, will provide indexable links for Googlebot.

Full text indexing

There is also the issue of indexing full text. As already mentioned, Google does not follow the Open URL/virtual file paths generated by intraLibrary and all the results from site:http://repository.leedsmet.ac.uk/ are search results. Potentially this is a benefit in as much as people are less likely to bypass the metadata record and go directly to the PDF but we do also want to facilitate full text indexing. We may have to wait for Intrallect on this who have assured us they are looking into facilitating full text indexing – probably via intraLibrary itself rather than the SRU.

A new era

In my last post I suggested that Repository News would be mothballed now that the final report has been submitted (still not published but soon!). However, our JISC funding was for a start-up project and we are still very much nurturing our neonate repository which, like a human infant, still has a lot of growing up to do.  I enjoy blogging and it seems rather artificial to start again so, like an infuriating parent sending a round-robin at Christmas, here is my first update of the new era.

All in all the little fella is doing very well though we were very disappointed to miss out on enhancement funding from uncle JISC – bid feedback was positive and stressed just how competitive the call had been. All is not lost, however, and we’ve just learned of institutional success in the recent JISC call for the Open Educational Resources programme; Unicycle will be underway very soon and will necessarily use our intraLibrary repository which should put us in a very good position with respect to JORUM – also based on intraLibrary of course – and I’ve already implemented the JORUM metadata template (with permission). The repository will also be an integral component of the PC3 project funded under the e-Learning Capital programme which is already underway.  In addition,  we intend to submit a bid for the rapid innovation call – the #jiscri projects are relatively small scale timetabled for just 6 months, but it would be very nice to get one; the deadline is Wednesday so it’s fingers crossed (again!).  Finally, there may also be a project in the pipeline with the NHS looking at deposit into multiple repositories using SWORD.

The difficulty is knowing where to start!  In terms of OA research the search interface still needs a lot of work to integrate advanced search; we also need to ensure that we are properly indexed by Google and I’m ashamed to say that I am yet to register with the Open Archives Initiative. Then there is the small matter of advocacy and full text content.  We also need to integrate with SFX, our URL resolver.

I want to look at using the repository for the CLA digitisation service emulating what Keele are doing with intraLibrary and a functioning PowerLink to the VLE would be nice, something like MrCute2 and more work around the conceptual PERSoNA outputs.

We have a meeting next week to discuss priorities and project management activities over the coming months which promises to be a headache inducing affair.

The final wordle

Now that the final report has been submitted, this blog will be archived, though it will remain available online indefinately.

I’ve just generated a Wordle at http://www.wordle.net/ which says it all really:

Wordle: repo