Bibliosight – querying Web of Science from the desktop
February 10, 2010 1 Comment
The Bibliosight project, as part of JISCRI, officially completed at the end of November 2009. However, due to issues beyond our control, specifically the fact that Thomson Reuters’ Web Services were not fully released until October 2009 and therefore not available to us within project timescales, final deliverables were not available at that time.
I am pleased to report that the project has now produced a desktop client that is able to utilise Thomson Reuter’s “Web Services Lite” to query Web of Science directly from the desktop. The code is available to download from http://code.google.com/p/bibliosight/ (Note: This is code only, not a product distribution (which requires access to WS Lite anyway). There is some very basic info in there on what you’d need to get it running) and you should see http://bibliosightnews.wordpress.com/2009/12/23/final-progress-post/ for more information.
As Bibliosight is now officially complete I am not contributing further to Bibliosight News but am posting here to explore practical uses of the client and also limitations of WSLite. This is prompted because, as a novice user of EndNote who has recently been exploring how to export from Web of Science into EndNote I am not now convinced that the client provides us with a solution beyond what could be achieved already with EndNote alone. This in no way denigrates the fantastic work that Mike has done developing the client and I’m sure there are plenty of practical uses of WSLite in general and our Bibliosight client in particular – it is just that I am thinking very much of an integrated workflow for research management/populating the repository and, at Leeds Met, EndNote is firmly established in the research administration process. I may also gently question the limitations that Thomson Reuter’s have placed on WSLite (which is free) given that, as subscribed users of WoS, we are already able to retrieve more data from WoS by export than via this free API.
The primary usecase that evolved through Bibliosight was as follows:
- Retrospectively download all Leeds Met records from WoS
- Run the query on a regular basis to retrieve new Leeds Met records in WoS
It was decided that the easiest way to achieve this was via a client that could query WoS from the desktop and return records as XML; this XML could then be converted by XSLT into an appropriate format for ingest into intraLibrary and/or other repository platforms and/or EndNote.
However, the data elements that are returned by WSLite are limited to:
- Authors — All authors, book authors, and corporate authors
- Source — Includes the source title, subtitle, book series and subtitle, volume, issue, special issue, pages, article number, supplement number, and publication date
- Keywords — all author supplied keywords
- UT — A unique article identified provided by Thomson Reuters
In addition, a single query is limited to just 100 results; additional queries can be submitted in succession but this is inconvenient with the current application.
As a subscribed user, I am able to log-in to Web of Science, perform a query and export directly to EndNote (a maximum of 500 records); this includes most of the data available from WSLite, it also includes <ref-type> which is an EndNote specific numerical value – I don’t think we can get an equivalent from WSLite – it also includes an abstract.
N.B. Though it is possible to submit a query for Source Publication (SO) though this is for the title of a specific publication so doesn’t help to identify <ref-type>
The issue of abstracts is an interesting one and I recently posted a naive question to the UKCORR discussion list:
“The T&C for Web Services explicitly disallows including the abstract – which I can’t get anyway (!) – but are WoS abstracts not simply author-produced abstracts harvested from publisher’s websites in which case shouldn’t I be able to use them?“
I got a couple of helpful responses from Alison Sutton, Repository Manager at the University of Reading and Leslie Carr of Southampton:
Alison said that they explicitly asked Thomson Reuters if they could use their abstracts in their repository and were told they could not because publishers don’t give Thomson the right to distribute them, which is why they are not included in WSLite. Les, however (while at pains to emphasise that he is not in fact a lawyer!) suggested that there is no copyright on journal article abstracts in the UK and although Thomson cannot grant a license to use them, you do not actually need one.
I suspect we need a real legal eagle to establish whether or not there is any legal reason why we could not use an abstract procured from WoS which (in most cases) is exactly the same – right down to the minutiae of the ASCII code – as the abstract from the publisher’s website which I believe is actually supplied by the author in the first place.
The only data element that doesn’t appear to be returned by the export method, that IS returned by WSLite is the unique identifier UT which will need for AMR to return citation counts (though it is returned when exporting in HTML for example.)
The long term value of WSLite – via Bibliosight or some other implementation – would be in a more intuitive, integrated process for the end user – and though Bibliosight, perhaps, is not there yet, the project output will still provide value for the community – also, I think, as a case study for Thomson Reuters and while they certainly have their commercial imperatives, when we met with them back in September (and as I blogged at the time) I was given the impression that the company has been practising something of a balancing act to weigh their commercial interests against providing appropriate value added services to their subscribers under existing licensing agreements.