Web of Science | Repository News

UKCoRR Meeting Leicester

February 22, 2010 by Nick 2 Comments

On Friday Wendy and I trekked across the frozen wastes of Northern Britain to Leicester (which I have now looked up on a map and discovered is pretty much slap bang in the middle of Albion). I had missed the last UKCoRR meeting back in August ’09 due to gestational issues and was looking forward to catching up with repository-managing colleagues, some of whom were familiar faces, some of whom I recognised only by name and some of whose aquaintance was entirely novel – indicative of the growing repository community in the UK with the membership of UKCoRR now getting on for 200 souls. The meeting was well attended which is especially exciting given that the Council is unfunded and operates only by the dedication of its Committee and the good-will of its members, a paragon of which was our host Gareth Johnson who kept a live blog of proceedings at http://uollibraryblog.wordpress.com/2010/02/19/ukcorr-meeting-university-of-leicester/.

I was due on at 11:50, hopefully to demostrate the Bibliosight prototype, and Gareth and I had been liaising with Thomson Reuters in order to have IP authentication to access the service; I was there early and, with trepidation, tested the prototype on the presentation PC and….it didn’t work. Nor could I gain access to the wireless network to pore over emails in order to resolve the problem so I decided, instead, to have a cup of tea and rely on my Blue Peter style contingency – here’s one I prepared earlier.

The running order is below – presentations from the event will appear on the UKCoRR blog in due course so I’ll link to them from here as and when:

Chair’s Activities – Jenny Delasalle, UKCoRR Chair
Aberystwyth University, CADAIR and Me – Nicky Cashman, UKCoRR Secretary
The Bibliosight Project: Querying Web of Science from the Desktop – Nick Sheppard, Leeds Metropolitan University (presentation below.)
The Repository Scene at Leicester – Gareth Johnson, University of Leicester
Web & Publicity Update – Dominic Tate, UKCoRR Web & Publicity Officer
Welsh Repository Network: Services and Support – Hannah Payne, Aberystwyth University*
Advanced RoMEO – Jane Smith & Peter Millington, University of Nottingham
Copyright and Repositories Workshop – Nicky Cashman, Aberystwyth University

* Hannah also showed us some learning objects produced by WRN and there are plans to create several more.

Bibliosight (UKCoRR presentation):

Gareth has done a pretty good summary over at The UoL Library Blog so here I really just want to follow up after my own presentation, with reference to the R4R project and JournalTOCsAPI, and also perhaps note one or two particular points that caught my interest through the day.

There is a screen-cast of the Bibliosight prototype at http://www.leedsmet.ac.uk/inn/repository/bibliosight/video/; as I mentioned, the R4R project has also developed a prototype to perform a similar task and Les Carr posted to the list last week emphasising that the major issue from their point of view is duplicate avoidance / reconciliation and that he and the R4R team would be looking for guidance from repository managers around how we already tackle this issue. We didn’t really have time to discuss these issues on Friday and I’ll just reiterate Les’ on-list invitation to please post any initial feed-back there which will also help the project team to focus their questions.

The other questions and issues I asked folk to consider were:

Different workflows relevant to:
- Backfilling a repository with a one-off download
- Ongoing use to populate repository

Other uses for records downloaded from WoS?

Other datastreams to populate a repository:
- UK PubMedCentral, arXiv
- sources that better serve the arts, humanities and social sciences

WoS records become available some time after publication*

Duplicate records/ambiguous relations with existing records?

Implications for a repository’s mission/reputation if balance of content changed by large number of WoS-derived records

* One of our use cases was to use Web Services as an alerting service in a similar manner as proposed by JournalTOCsAPI but the advantage in that case, as Jenny emphasised when she mentioned her involvement with the project at the top of the day, is that the data is immediately available, via the API, as soon as it is published, whereas, in the case of Web Services, there will be a delay while data is harvested/re-keyed by Web of Science.

We also briefly considered another question that had already arisen on the list from Hannah Payne of the Welsh Repository Network around acknowledging the source of WoS data in repository records with the vague consensus that acknowledgement should go in the rights field; it was pointed out that, in the UK, metadata does not carry any copyright restrictions but I think the issue is purely about acknowledging WoS as the source of the data – there are some issues around this to clarify, however, as we are given to understand that abstracts are not made available via Web Services as WoS are not able to grant a license for reuse even though it seems that a license may not actually be necessary (see last post).

Anyway, I hope these issues are taken up in more detail either here or on the UKCoRR discussion list.

Other interesting issues that came up during the day and that deserve fuller posts of their own in the fullness of time were:

CRIS (Current Research Information Systems)
Institutional mandates for etheses (relates to developing ETHoS service)
Forthcoming report on the economics of Open Access from Alma Swann
Integrating repositories with the REF (WRN is planning a repository and CRIS event right here in sunny Leeds using the Rose Bowl, our world-class conference facility – TBA)
Advanced RoMEO and ongoing developments to the service

I really enjoyed the day and thank you to everyone involved. Looking forward to the next meeting!

Filed under BiblioSight, Event Tagged with API, JournalTOCsAPI, Leicester, metadata, R4R, Repository, SHERPA RoMEO, UKCoRR, Web of Science, Welsh Repository Network, WRN

BiblioSight project recommended for funding

June 3, 2009 by Nick Leave a comment

We’ve just learned that we’ve been successful in our most recent funding bid to JISC’s Rapid Innovation call.

Outline project description:

“The project will aim to exploit the Web of Science Web Services API that uses standard transport protocols, such as HTTP, and message formats, such as SOAP and XML, to facilitate the exchange of data between Web of Knowledge and a custom application. It will build on work undertaken by the JISC funded SUE project, Implementing an Institutional Repository for Leeds Metropolitan University to integrate bibliographic information from Web of Science into the Leeds Met Open Access repository of research; this will facilitate automatic update when a published article appears in Web of Science. The aim is to integrate the technology into an efficient workflow to populate the repository with citation information / full text; we will also build on work undertaken by the JISC funded PERSoNA project and aim to develop a ‘widget’ that can easily be added to a personal environment like iGoogle or personal/communal environment like netvibes and that will extract bibliographic information – and potentially also bibliometrics – for authenticated Leeds Met staff in line with Web of Science licensing.”

Filed under A new era, BiblioSight Tagged with #jiscri, API, Web of Science

Development of Research Repository Aspect of IntraLibrary

June 1, 2009 by Nick 3 Comments

On Friday Mike and I visited colleagues at Keele University for a meeting with Charles Duncan from Intrallect to consider development priorities for intraLibrary to better serve our needs as a research repository. Over 4 and a half hours we considered the basic issues that need addressing as well as looking forward to some more ambitious functionality and integration with the wider research infrastructure as we move towards the REF.

I was particularly interested to learn about how Keele are implementing Symplectic’s publications management system – http://www.symplectic.co.uk/ – which regularly trawls Web of Science and PubMed central for information about Keele’s academic publications. Symplectic have clearly been thinking about integration with IRs and there’s even a link to SHERPA/RoMEO. The system was used at Imperial College London for the RAE 2008 process and includes link functionality with DSpace which is that institution’s IR platform – http://spiral.imperial.ac.uk/. Intrallect are currently liaising with Symplectic about integration with intralibrary – I’m not certain precisely what form this would take but in an ideal world it would be great if we could auto populate as much metadata as possible (title/bibliographic info/abstract/author/copyright status according to RoMEO) and automatically nudge academics for full text where appropriate!

At Leeds Met we currently lack any form of research database which is why I’ve been exploring what are essentially manual workflows to populate the repository with all research output – I’m not sure how expensive Symplectic is and it may be difficult to justify given this institution’s relatively small research output and the repository may well have to be the research database which is the assumption I’ve been working on; we will also want to explore the soon-to-be-released Web of Science API which may, in any case, enable us to emulate some of this functionality ourselves.

The first item on our agenda was somewhat more prosaic and focussed on our immediate functional requirements – SRU searching and metadata. Mike has been working on incorporating advanced search into the SRU interface and come up against a couple of issues when searching by author and date which are essentially artifacts of having to query DC rather than LOM; in the LOM, creators and contributors are clearly differentiated, however, querying by DC conflates creator and author roles which may (will) be different if resources are uploaded by someone other than the author.

Searching dc.creator will search for the creator and author roles
Searching dc.contributor will search for the content provider role

In addition:

Searching by dc.date only searches data that relates to the intraLibrary submission process (i.e. the deposit date, and perhaps modification dates if you added an author later on for example)
The only way to search journal dates is to use the default free text search that searches everything (or most fields anyway).

The solution, of course, is to make it possible to query the LOM by SRU and this is now Intrallect’s intention – indeed, to render all LOM fields query-able which would include user generated tags for example.

The next big question is exposure of open content to search engines and Charles gave us an overview of plans to develop an object “home page” with a static URL which should help in this area. We also discussed sitemaps and what need to be done external to intraLibrary. I’m still unclear on how we can improve the format of results returned by Google from the SRU interface; to repeat, Google IS indexing http://repository.leedsmet.ac.uk/ with site: http://repository.leedsmet.ac.uk/ currently returning over 500 records. However this is fairly unstructured; Google is simply following links from http://repository.leedsmet.ac.uk/main/browse.php; any subsequent links Googlebot encounters are also indexed and returned as “The Repository search for [link name]” and ideally I’d like results to be returned in a more structured and user friendly form. Many queries actually return no results where there is (yet) no content to find though where there is content, Google is indexing all human readable metadata. I’m also not certain whether Googlebot is finding its way into the full text via the Open URL/virtual file paths generated by intraLibrary. Full text indexing within intraLibrary itself has also been promised.

In short, I’m really not sure how all of these factors may combine to be exploited by a next generation SRU interface!

We then touched upon self-archiving and (semi) mediated workflows; potentially developing SWORD based quick deposit from desktop/web, ideally with automatic metadata generation.

The two other major issues we considered are:

Policy metadata – handling embargoes

This is pretty crucial to an OA archive of research as many publishers of academic journals specify an embargo period of 12 or 18 months from the date of publication before a paper can be made available in a repository. We need to be able to add a paper to intraLibrary upon receipt but restrict access until the embargo has expired and for this to happen automatically. On one level, this functionality should be fairly straightforward to achieve by having intraLibrary check today’s date against an embargo date specified in the metadata; it’s a little more complicated than that though as we would want the metadata to be visible before the embargo date, just not the full text.

Cover pages for PDF

It was suggested that a coversheet should be generated by intraLibrary on the fly which would certainly be useful as manually creating cover sheets for each and every article is time consuming to say the least; this would be useful functionality for CLA materials which also require a coversheet.

These developments will take some time to implement and the next stage is to prioritise – by anonymous e-postal ballot – Intrallect hope we will start to see some of the major initiatives in a build towards the end of the year.

Thank you to our colleagues at Keele for making us welcome and for feeding us!

Filed under Adapting intraLibrary, Open Access Tagged with API, coversheet, DSpace, embargo, Google indexing, Keele, PubMed central, SRU, Sympectic, Web of Science

Repository News

UKCoRR Meeting Leicester

BiblioSight project recommended for funding

Development of Research Repository Aspect of IntraLibrary

Categories

Recent Posts

Archives

Blogroll

Meta