Still baffled by Google…
April 3, 2012 Leave a comment
Just reproducing an email to ukcorr-discuss here in case any technically minded folk not on the list might pass by these parts…
To revisit the whole Google Scholar / full-text indexing “thing” I was just looking at results in GS for a particular academic who has raised a query about his full-text not being visible in Google Scholar; he has 6 full-text in the repository but a site: search of GS only appears to return x2:
Initially I thought it may be an artefact of when full-text were added; records were all added at the same time (24th May 2011) but full-text was only added for one of the GS results at that time (plus one not indexed at all – see below) as opposed to October 2011 for all the others (including the other GS result)…and that’s still a good 6 months which you would think would be long enough to be indexed. Wouldn’t you?
Normal Google, by contrast, returns 4 full-text records:
The missing results are http://repository.leedsmet.ac.uk/main/view_record.php?identifier=4881&SearchGroup=Research (full-text added 24th May 2011) / http://repository.leedsmet.ac.uk/main/view_record.php?identifier=4893&SearchGroup=Research (full-text added 10th October 2011).
The only other difference I can spot is that several of those non-indexed in GS don’t have metadata in the PDF (which is why they have just been picked up in normal Google as “Leeds Metropolitan University Repository” from the coversheet…
As a caveat, there is technical peculiarity in that we effectively have a two-server set up with our Open Search interface on an institutional server which queries intraLibrary by SRU, the software itself hosted for us in a server-farm somewhere which might explain idiosyncratic behaviour to some extent…
Am I missing anything else?!