Still baffled by Google…

Just reproducing an email to ukcorr-discuss here in case any technically minded folk not on the list might pass by these parts…

To revisit the whole Google Scholar / full-text indexing “thing” I was just looking at results in GS for a particular academic who has raised a query about his full-text not being visible in Google Scholar; he has 6 full-text in the repository but a site: search of GS only appears to return x2:

http://scholar.google.co.uk/scholar?hl=en&q=site%3Ahttp%3A%2F%2Frepository-intralibrary.leedsmet.ac.uk+%22x.+font%22&btnG=Search&as_sdt=0%2C5&as_ylo=&as_vis=0

Initially I thought it may be an artefact of when full-text were added; records were all added at the same time (24th May 2011) but full-text was only added for one of the GS results at that time (plus one not indexed at all – see below) as opposed to October 2011 for all the others (including the other GS result)…and that’s still a good 6 months which you would think would be long enough to be indexed. Wouldn’t you?

Normal Google, by contrast, returns 4 full-text records:

https://www.google.co.uk/search?hl=en&as_q=&as_epq=xavier+font&as_oq=&as_eq=&as_nlo=&as_nhi=&lr=&cr=&as_qdr=all&as_sitesearch=http%3A%2F%2Frepository-intralibrary.leedsmet.ac.uk%2F&as_occt=any&safe=images&tbs=&as_filetype=pdf&as_rights=

The missing results are http://repository.leedsmet.ac.uk/main/view_record.php?identifier=4881&SearchGroup=Research (full-text added 24th May 2011) / http://repository.leedsmet.ac.uk/main/view_record.php?identifier=4893&SearchGroup=Research (full-text added 10th October 2011).

The only other difference I can spot is that several of those non-indexed in GS don’t have metadata in the PDF (which is why they have just been picked up in normal Google as “Leeds Metropolitan University Repository” from the coversheet…

As a caveat, there is technical peculiarity in that we effectively have a two-server set up with our Open Search interface on an institutional server which queries intraLibrary by SRU, the software itself hosted for us in a server-farm somewhere which might explain idiosyncratic behaviour to some extent…

Am I missing anything else?!

Advertisement

An institutional tangram – musings on developing an integrated research management system

“The tangram (Chinese: 七巧板; pinyin: qī qiǎo bǎn; literally “seven boards of skill”) is a dissection puzzle consisting of seven flat shapes, called tans, which are put together to form shapes. The objective of the puzzle is to form a specific shape (given only an outline or silhouette) using all seven pieces, which may not overlap.”

http://en.wikipedia.org/wiki/Tangram

Having implemented an institutional repository at Leeds Metropolitan and learning by experience some of the difficulties associated with advocacy around the use of that repository (both for OA research and OER) I have become all too aware “that repositories are ‘lonely and isolated’; still very much under-used and not sufficiently linked to other university systems”. So said JISC’s Andy McGregor at an event called “Learning How to Play Nicely: Repositories and CRIS” in May 2010 at Leeds Metropolitan (see my report for Ariadne here). This quote is still relevant, though  perhaps a little less so than when I heard it nearly 2 years ago, thanks to the ongoing work of JISC and particularly the RSP. In any case, the event was a revelation for me and I have coveted a so called Current Research Information Management systems (or CRIS for short) ever since!

And now, in Symplectic Elements, I have one…or at least the components of one (click on image for full size.)

The finished tangram? (click on image for full size)

It’s a puzzle though. A tangram if you will…one with considerably more than seven pieces:

intraLibrary, Symplectic, institutional website, University Research Office (URO), faculty research administrators, The Research Excellence Framework (REF), academic staff, web-developers, bibliographic information, research outputs, Open Educational Resources (OER)…

In fact, this may well not be all the pieces…pretty sure a few have been pushed down the back of the settee. I’ll look for them later.

Anyway, tortured metaphors aside, I have become increasingly aware that working in a large institution, in a role that encompasses technology and institutional policy (though I’m not, by any means, a policy maker…or indeed a real techie) is largely about communication and getting the right people, with the right skills, in the right place at the right time! Absorb policy and technical requirements from senior stakeholders and communicate those requirements to the proper techies – while also trying to ensure any motivating passions of one’s own don’t get lost along the way – Open Access to research and Open Education in my case.

For various reasons, individual user accounts have never been implemented for our repository and historically it has been administered centrally from the Library. In Symplectic we now have a system that is populated with central HR data; all staff will have an account they can access with their standard user name and password from where they can manage their own research profile including uploading full-text outputs directly to the repository*. In addition, administration by the University Research Office and faculty research administrators will be more easily centralised (particularly for the REF).

* In actual fact this functionality is not yet available in lieu of development work from Intrallect to capture the Atom feed from Symplectic and transform with XSLT to a suitable format for intraLibrary. I think.

One of the clever bits of functionality used to sell the software is automatic retrieval of bibliographic data from online citation databases – we are currently running against various APIs, Web of Science (lite), PubMed and arXiv – but I think this may actually be a bit of a red-herring for an institution like Leeds Metropolitan – at least until more (preferably free) data sources are available (JournalToCs API please!); early testing has shown, at best, it will only retrieve a subset of (the types of) outputs that we will need to record and it will be necessary to manually import existing records (e.g. EndNote) as well as implementing other administrative procedures at faculty level to capture information at the point of publication, especially for book-items, monographs, conference material, reports and grey literature.

More important, I think, to ensure that academic staff actually engage with the software rather than just seeing it as a tool for administrators, is to re-use the data to generate a list of research outputs – a dynamic bibliography – on a personal web-profile which has the potential to dramatically increase the visibility of research including Open Access to full-text.

Developing staff profiles of this type has been something of an obsession of mine for a while; we explored doing so from the repository (using SRU and email address as a Unique Identifier) and did develop a working prototype. Symplectic, however, integrated with central HR data and with its more sophisticated API, should make it much easier, at least from a technical perspective, and we are currently liaising with the central web-team to develop something similar to this example from Keele University – http://www.keele.ac.uk/chemistry/staff/mormerod/ (like us, Keele run Symplectic alongside intraLibrary.)

N.B. From the Symplectic interface, a user is able to “favourite” a research record and a flag comes out in the xml from the API which I understand is used on this page to display “Selected Publications”. DOI is also available from the API to link to the published version and if a user uploads full-text to the repository from Symplectic, this link is also in the xml – the first two records on this page include links to the full-text in Keele’s intraLibrary repository.

Our own Library web-dev Mike Taylor has been looking at the Symplectic API in detail and has put together a couple of prototype pages on a development server and after a meeting this week with a representative of the central web-team I’m reasonably confident we can move forward with this work fairly quickly…though there’s still a bit of a chicken & egg situation in populating the Symplectic database to then be re-surfaced via the API in this way.

There is also the question of whether we might alter our repository policy to become full-text only; one limitation of repositories across UK HE from an original conception (in the arXiv mould) of holding, disseminating and preserving full-text research outputs, is that they have in effect become “diluted” by metadata records for which it has not (yet) been possible to procure full-text or copyright does not permit deposit and “hybrid” repositories like ours, of full-text and metadata typically contain more metadata records than full-text (see figures from the RSP survey here). As I have argued on the UKCoRR blog, I think is makes sense to separate a bibliographic database (in Symplectic) from full-text only in a repository.

N.B. As Symplectic does not have the same search functionality as the repository, this approach has the potential disadvantage that it makes it more difficult to search across the entire corpus of research records (though one potential solution may be along the lines of that implemented by City Research Online which, in my view is rapidly becoming an exemplar of a research management system (Symplectic) + full-text repository (EPrints). Another good example is  St Andrews (PURE + DSpace) who presented a case study at “Learning How to Play Nicely: Repositories and CRIS” (video here.)

And what of OER? Along with our EasyDeposit SWORD interface, using OER to resource the refocus the undergraduate curriculum and the soon to be released intraLibrary 3.5 that will enable us to harvest OER from other repositories…for now I think they may be the bits down the back of the settee…

Turning a Resource into an Open Educational Resource (OER)

As this is the inaugural Open Education Week (whaddya mean you didn’t know?!) here’s a great 5 minute animation from OER IPR support giving an overview of IPR and licensing issues you need to be aware of when creating and repurposing Open Educational Resources.

Uploaded to the Leeds Met repository under the terms of CC-BY-SA 😉

Turning a Resource into an Open Educational Resource (OER) – Leeds Met Repository Open Search.

Infrastructure schematic (1st draft)

There are several significant developments that will impact on our repository / research management / OER dissemination and discovery over the next 12 months or so…briefly these are:

This is a quick schematic of how the developing infrastructure might look (a bit big to fit in my WordPress theme so click on image for full size):

Plugged-in for OER

As mentioned in this recent post I’ve been experimenting with WordPress for presenting OER and have been testing a pre-release version of a WordPress plug-in, developed by the Triton project at the University of Oxford to facilitate a dynamic collection of OER in a WordPress blog.

Developer @patlockley describes the overall functionality of the plug-in here and also covers some of the limitations posed by the broader OER infrastructure here emphasising that “no standard API exists across repositories so as to facilitate a single approach to aggregation for an aggregation creator” – as well as a seperate post here considering limitations of the WordPress platform itself used in this context and associated technical considerations.

In summary the plug-in searches Xpert, Merlot and OER Commons (via their API) as well as Wikipedia, Wikibooks and Wikiversity for openly licensed material; Mendeley for journals and with options to add RSS feeds for blogs and podcasts.

Here I’ll briefly describe my experiences of using the plug-in – fairly candid in the hope that it will be useful feedback to Pat and Triton albeit with the initial caveat that any issues I’ve encountered are just as likely to be associated with my limited experience of WordPress and my shambrarian status (I simply haven’t had time to hone the search terms as carefully as I would like) as with the plug-in itself (which of course is pre-release.)

Once installed, famously straightforward in WordPress even prior to release (via FTP), you get a new “Dynamic Collection” tab in the dashboard where I can add a new collection…pretty much at random, I chose an undergraduate course from Leeds Met – Civil Engineering – around which to build my dynamic collection – it’s then just a matter of adding title and search terms, updating the feeds from the three source repositories and publishing:

This admittedly unsophisticated search returned 9 results:

Obviously the plug-in is only as effective as the keyword data / api / source repository(ies) that it is using and the fifth link here actually points at an entirely different resource (in Jorum) with no relevance to Civil Engineering, presumably due to an error at some point along it’s, er, conjugation – as the plug-in does not search Jorum directly this must have come via Xpert which does harvest Jorum. While experimenting with the plug-in I’ve also had instances where links have returned 404s or been otherwise broken so one requirement I think would be the option to remove links from the collection that are incorrect, broken…or simply less relevant; to allow the WordPress administrator fuller control of the collection.

In order to add a blog or podcast under the Settings tab, the plug-in has installed several new tabs (I don’t think the Feed management / Collection statistics / Collection tabs are yet fully functional in the version I am testing):

Under the Dynamic Collection Options there are fields to add rss feeds from blogs or podcasts:

I’ve experienced a few teething troubles adding blogs not least because I don’t know much about Civil Engineering! As I understand, it should search blog title and description for the dynamic collection keywords…I added a feed from http://www.civilengineering.co.uk/feed/ which returned this single (most recent) post – http://www.civilengineering.co.uk/2010/09/civil-engineering-issues/ (the blog, in fact, only appears to comprise 2 posts so presumably would update should any new posts be added?)

I’m very optimistic about the potential of this approach to allow WordPressing course leaders, perhaps with support from learning technologists, to quickly and easily assemble a dynamic collection of OER for their students and look forward to the formal release of the finished product* – in the meantime, in true Blue Peter stylee, here are a number of collections that Pat made earlier to give a sense of what should be possible:

http://politicsinspires.org/dynamic_collection/political-theory/

http://politicsinspires.org/dynamic_collection/comparative-government/

http://politicsinspires.org/dynamic_collection/international-relations/

http://politicsinspires.org/dynamic_collection/european-politics-and-society/

* The only caveat from my perspective is that my own institution does not formally support the use of WordPress, nevertheless, there is certainly a requirement, explicitly identified by senior stakeholders,  to develop tools to cross-search Open Educational Resources and, in this context, I think we can learn a lot from the Triton project.

N.B. Such a mechanism, however implemented via the proliferation of OER repositories and their APIs, also put me in mind of this post from Suzanne Hardy (@glittrgirl) of MEDEV and the PORSCHE project – Branding, repositories, OER and awareness raising: some thoughts on embedding OERs

See also: Delores OER – WordPress for hosting and describing learning resources (University of Bath and Heriot-Watt)

WordPressure

Motivated by this post on the OpenSpires blog from @patlockley I’ve been experimenting with WordPress with a view, ultimately, to providing a one-stop OER environment for my institution. Pat has written a plug-in that allows the WordPress admin to specify search terms to create (a) dynamic collection(s) from Xpert, Merlot and OER Commons via their APIs (also searches Wikipedia, Wikibooks and Wikiversity for Openly licensed materials, openly licensed blogs on politics and Mendeley for journals as well as political podcasts from OpenSpires.) For examples of the plug-in in action see http://politicsinspires.org/oer/political-theory/

The plug-in isn’t yet publicly available – I’m hoping that I can have a go fairly soon *waves at Pat*…I’m no WordPress developer and am just finding my way round a test install of the platform, experimenting by pulling in different feeds from various sources (our own repository, Jorum, HumBox) using a plug-in called FeedWordPress – http://feedwordpress.radgeek.com/. It’s dead easy to syndicate one (or multiple) feeds to a designated posts page but what I can’t figure out is how I might push different feeds to different pages so I could, say, have one page that auto-publishes from the Leeds Met repository, one from Jorum, one from Humbox etc.

Below: Syndicated posts from Jorum (HE – Architecture, Building and Planning) to a “Jorum” page…but how can I push separate HumBox and Leeds Met feeds to the respective pages?

OERtest

Linking from a research paper to associated OER and thoughts on extending the CRIS model to OER

With our “blended” repository comprising research and UKOER, I still feel very much like I have a foot in two camps. A feeling that, ironically, is reinforced, by my role as Technical Officer for UKCoRR – the UK Council of Research Repositories!

I think I’m right in saying that it’s still atypical to manage both types of resource with a single repository platform and there are certainly considerations why it may not necessarily be desirable – both from a technical and political perspective.

The main repositories that have been developed as part of the ukoer programme are modifications to DSpace (Jorum) and EPrints (HumBox, EdShare), the two main open source repository software platforms that were both initially developed to manage research. In contrast, we have worked with intraLibrary, a commercial learning object repository, to manage both OER and research and while this certainly hasn’t been without it’s problems, I’m naturally interested in potential benefits from this approach both in terms of “reward & recognition” for OER by something analogous to peer-review perhaps (a theme that was explored as part of the Unicycle project) and also in terms of work-flow, possibly mediated via a CRIS-type system such as Symplectic Elements or Atira Pure…

intraLibrary has a workflow to link related resources which I can easily use to link a research paper with associated OER, so in the example below I can link…

Coates, C., Smith, S. (2010) Promoting the concept of competency maps to enhance the student learning experience. Assessment, Teaching and Learning Journal (Leeds Met), 10 (Winter), pp.21-25.

…to the three ALPS Common Competency Maps in the OER collection (see Linked Resources at the bottom of the record):

ALPS Common Competency Map – Communication

ALPS Common Competency Map – Ethical Practice

ALPS Common Competency Map – Team Working

These records, in turn, comprise links back to the research paper (and associated conference paper):

With such an approach, is there perhaps an opportunity to tie research and OER more closely together at an institutional level (if this isn’t politically naive!) and contribute to research led teaching?

The next stage might be to develop a common workflow for research and OER…

Workflow, in fact, has long been a bug-bear of mine and, for both types of resource, essentially remains fully mediated by me and administrative colleagues. In all likelihood, however, as are many institutions, we will soon be implementing a CRIS that will make it easier to collate institutional research outputs by harvesting research data from external bibliometric sources, as well as allowing records to be added manually, and integrating with the repository such that academic staff are able to attach an appropriate full-text to a record and upload it along with metadata into the repository directly from a “user-friendly interface” (TM).

At a recent demo of one of these types of system I confirmed that it could transfer a range of file-types to a repository (utilising SWORD) as well as allowing various licences to be configured including (I think) Creative Commons so there seems no fundamental reason why such a system could not be used to support the workflow for both OA research and OER.

Of course I will need to get my hands on one of these systems before I can properly investigate exactly what is achievable…watch this space.

Repository deposit from the desktop

Thinking about repository workflows for staff – put a deposit client where their resources live, on their desktop…

What I have:

A (slightly unwieldy) set of files comprising:

Quick drop file set

How it works:

The VB script was written by Boyd Duffy at Keele University and, as a non-developer, I know only that I need to edit  sword_deposit.vbs with my SWORD DEPOSIT_TARGET. It’s then simply* a matter of dragging and dropping a file (or multiple files) onto the VBS icon for them to be uploaded into the repository (workflow can obviously be configured in the repository itself, to be published immediately**, for example, or, more likely, go into a workflow where metadata can be added according to a particular Application Profile).

** I think Keele use it as a quick and dirty method for image files to be transferred from desktop to repository from where they can be immediately accessed via a VLE PowerLink.

Here is a screen capture that I did a while ago: http://www.leedsmet.ac.uk/inn/repository/video/SWORD_drop_from_desktop/

* Re simple – I can, in fact, only make it work from a Leeds Met IP!  Perhaps something to do with PROXY_HOST / wireless?

What I need:

METADATA of course!

The current tool is of limited use as it just pushes a file into the repository. In fact, it will quite happily push a Content Package – a Zip comprising a file and some metadata as XML – either an IMSMANIFEST (which I would need for intraLibrary) or METS for DSpace (i.e. Jorum.)

Though I don’t have the skills myself, I’m hoping someone can tell me how we might develop a desktop app to integrate a way of capturing the metadata associated with a resource, converting it into an IMSMANIFEST and/or METS, zipping the whole lot up and pushing it to a repository (or multiple repositories) via SWORD …

If we were to use our current ukoer AP we would need to capture:

  • Title
  • Description
  • (Uncontrolled) Keyword(s)
  • Author / owner / contributor
  • Date
  • Type of resource
  • Technical format
  • Licence information
  • Subject classification (HEA and JACS)

Click link below for an example IMSCP:

http://repository-intralibrary.leedsmet.ac.uk/IntraLibrary?command=open-package-download&learning_object_key=i3605n162666t.zip

Or link below for METS (with cut-down metadata); this package has been successfully deposited in Jorum (dev) via SWORD:

http://repository-intralibrary.leedsmet.ac.uk/IntraLibrary?command=open-preview&learning_object_key=i3128n92902t

N.B. A practical issue with this approach might be including such an application on an institutional staff build and I have heard rumours that it might be possible to achieve similar drag and drop functionality with a web-based app using HTML5 – browser support still inconsistent though I think.