Still baffled by Google…

Just reproducing an email to ukcorr-discuss here in case any technically minded folk not on the list might pass by these parts…

To revisit the whole Google Scholar / full-text indexing “thing” I was just looking at results in GS for a particular academic who has raised a query about his full-text not being visible in Google Scholar; he has 6 full-text in the repository but a site: search of GS only appears to return x2:

http://scholar.google.co.uk/scholar?hl=en&q=site%3Ahttp%3A%2F%2Frepository-intralibrary.leedsmet.ac.uk+%22x.+font%22&btnG=Search&as_sdt=0%2C5&as_ylo=&as_vis=0

Initially I thought it may be an artefact of when full-text were added; records were all added at the same time (24th May 2011) but full-text was only added for one of the GS results at that time (plus one not indexed at all – see below) as opposed to October 2011 for all the others (including the other GS result)…and that’s still a good 6 months which you would think would be long enough to be indexed. Wouldn’t you?

Normal Google, by contrast, returns 4 full-text records:

https://www.google.co.uk/search?hl=en&as_q=&as_epq=xavier+font&as_oq=&as_eq=&as_nlo=&as_nhi=&lr=&cr=&as_qdr=all&as_sitesearch=http%3A%2F%2Frepository-intralibrary.leedsmet.ac.uk%2F&as_occt=any&safe=images&tbs=&as_filetype=pdf&as_rights=

The missing results are http://repository.leedsmet.ac.uk/main/view_record.php?identifier=4881&SearchGroup=Research (full-text added 24th May 2011) / http://repository.leedsmet.ac.uk/main/view_record.php?identifier=4893&SearchGroup=Research (full-text added 10th October 2011).

The only other difference I can spot is that several of those non-indexed in GS don’t have metadata in the PDF (which is why they have just been picked up in normal Google as “Leeds Metropolitan University Repository” from the coversheet…

As a caveat, there is technical peculiarity in that we effectively have a two-server set up with our Open Search interface on an institutional server which queries intraLibrary by SRU, the software itself hosted for us in a server-farm somewhere which might explain idiosyncratic behaviour to some extent…

Am I missing anything else?!

Resource discovery at Leeds Met Library

Just a quick plug for a new Leeds Met blog investigating Resource Discovery and Federated Searching systems for Leeds Met Library:

http://leedsmetlibrary.wordpress.com/

It is, er, blogged by my colleague @DebbieMN but will also include contributions from other library staff and recently, for example, our graduate trainee has posted about his first impressions of Serial Solution’s Summon which is billed as a “web-scale discovery service” that “allows the researcher to quickly search, discover and access reliable and credible library content.” (http://www.serialssolutions.com/summon/)

I also attended the Summon demo last week and was pretty impressed by the Google style simplicity of the search interface – which I suspect will be very popular with students – though some of my librarian colleagues did express reservations about the potential impact on information literacy and were keen to see the advanced search functionality; it is still important to teach more sophisticated information retrieval skills even if students are likely just to head to the simple search box of Google (or Summon)!

One aspect I was particularly interested in was the apparent ease with which Summon can be configured to search an institutional repository – functionality that the University of Huddersfield, who are now running Summon, have already implemented to search their EPrints repository – Huddersfield’s @daveyp tweeted this example using the name of their repository manager @graham_stonehttp://hud.summon.serialssolutions.com/search?s.q=graham+stone

Getting there, slowly but surely

The Repository is really starting to take shape; the search interface has now been installed on a development server (as discussed previously, we are using the IRISS SRU client) and is returning very satisfying results on my test content. Now we can start adding the extra functionality (browse, advanced search) – well Mike T can at any rate, and my more technically inclined colleagues – and then to customise the look and feel, though Mike has already added an enormous Leeds Met Rose!

Ongoing development of the interface will also feed into PERSoNA – in a meeting today with John and Mike, Wendy and I discussed one initial approach being to embed the search box/additional search functionality from the interface into a google app (feeding into Leeds Met’s developing partnership with Google) or some kind of generic plug-in or widget. I’ll try to expand on this at some point on PERSoNA News and ask for some pertinent blog input from John and Mike.

And I’ve uploaded my first research paper! A colleague in the library has a paper published in the Reference Services Review – which is a subsidiary of Emerald – and RoMEO green; Do Academic Enquiry Services Scare Students? (This link to the Emerald full text, not the author’s version in The Repository.)

At the moment I am very much focussed on the Staff Development Festival in September and have also been uploading citation information for demonstration purposes – I hope to use the Festival to encourage folk to supply full text copies of their research papers which can then be uploaded in line with publishers’ copyright transfer agreements and we can finally start building that representative body of content. I’ve set up a basic taxonomy within intraLibrary based on Leeds Met faculties and intend to upload 5-10 citations per faculty which I’m linking through to publishers’ abstract pages where possible. This should give us the opportunity to review metadata and get a preliminary idea of the workflow as well as illustrating to people why they might want to release copies of their work from behind subscription barriers (look, there can be links to your work all over the web but you can’t get any further than the abstract without a subscription fee.) The final choice of taxonomy should also be informed by demonstrations to academic staff – we already know that the steering group does not want to base it on faculties as the major organisational structure.

Mike has said that he can do some very preliminary customisation of the search interface before the festival to illustrate how the external browse functionality might work – this will be based on the taxonomies as they currently appear within intraLibrary and, given the short amount of time, will be for demonstration purposes only and probably won’t return dynamic results but should give people the opportunity to visualise the interface and comment on its development.