A better beta – Open Search Version 2.0
December 16, 2009 5 Comments
After mounting pressure from the University, we have now added live links to http://repository.leedsmet.ac.uk/main/index.php from Library Online and the research web-pages but it is important that folk are aware that it is still very much a beta implementation – I have tried to summarise ongoing development requirements below (both for research material and Open Educational Resources) and I am very keen to receive user feedback to inform ongoing development; I also plan on doing some formal user evaluation in the New Year.
The two main issues we have faced throughout the development process is that intraLibrary is designed as a Learning Object repository requiring user authentication. This means we have needed to develop an appropriate metadata template for research material (mapping UK LOM onto Dublin Core) and develop an external, openly accessible search interface to query that metadata (via SRU) and display results appropriately. Though there has been consultation throughout development I am still conscious that, *potentially*, there are issues outstanding that need to be more widely considered, especially as we move to establish a workflow involving more staff to populate the repository; obviously any changes to the metadata template will carry a more significant overhead the greater the number of records there are.
For research material, the workflow is currently configured in two stages and the approach we are currently considering would require me/designated administrator to complete the metadata in Table 1 and for trained cataloguers to complete the cataloguing information in Table 2:
Table 1 (primary metadata entry)
| Metadata element | Comments |
| Faculty | 6 Leeds Met faculties (& Thesis/dissertation) |
| Title | Article title |
| LOM identifier | Unique identifier applied automatically by intraLibrary |
| Digital Object Identifier | (if available) Utilising additional (editable) instance of LOM identifier field |
Bibliographic metadata:
|
It may not be necessary to complete all of these field depending on the “Type of resource”; they will all be required for a peer reviewed journal article but a book, for example, might only require “Source publication date” to be completed while a book chapter might require “Source title”; “Source publication date”; “Start page”; “End page”
N.B. A related issue that needs to be considered at this stage is that for records entered so far I have been inputting year only to the “Source publication date” field – however, we will need to differentiate records by census period; this perhaps could be achieved with a more specific date in the “Source pub date” field or we could explore using an additional field to record this information. |
| ISSN/ISBN* | Description field 1/4 – can’t really be differentiated from other description fields for search by SRU |
| Abstract* | Description field 2/4 – can’t really be differentiated from other description fields for search by SRU |
| Published/Not published* | Description field 3/4 – can’t really be differentiated from other description fields for search by SRU |
| Refereed/not refereed* | Description field 4/4 – can’t really be differentiated from other description fields for search by SRU |
| Author | System bug meant this could not be independently queried by SRU until recently – issue should now be resolved but not yet integrated with interface |
| Publisher | System bug meant this could not be independently queried by SRU until recently – issue should now be resolved but not yet integrated with interface |
| Contribution date | Automatically completed |
| Technical format | Always PDF for research |
| Type of resource | Scholarly text/ Book/ Book chapter/ Edited book/ Book review/ Miscellaneous conference item/ Conference contribution/ Journal item/ Electronic journal/ Electronic newspaper article/ Report/ Confidential report/ Submitted journal article/ Thesis or dissertation/ Working or discussion paper/ Lecture transcript
NB. It is relatively straightforward to add terms/modify existing terms though there may be work involved modifying records. For example, it has been suggested that the vocabulary term “Journal item” should be changed to “Journal article” and while this is relatively straightforward, there is no way of making a global change without reviewing each record individually – currently 218 items. |
| Statement of Copyright and Restriction | Where a record is citation only (i.e. no full text available) I am using this field to record information from SHERPA/RoMEO to indicate whether I should pursue the full text (for my reference only – i.e. not displayed in Open Search interface). When there is a full text available this field will generally record that it has been uploaded in line with the publishers’ terms and conditions and IS displayed in Open Search e.g. http://repository.leedsmet.ac.uk/main/view_record.php?identifier=696&SearchGroup=research) |
*The only way I have been able to accommodate all of the information required for research (using EPrints software as a template) with the intraLibrary metadata schema (UK LOM) is by using multiple description fields as extensions – though these can be differentiated for display purposes, utilising a template to ensure consistent entry – e.g. http://repository.leedsmet.ac.uk/main/view_record.php?identifier=696&SearchGroup=research, they cannot easily be differentiated for search purposes.
Table 2 (cataloguing information to be completed by trained cataloguers)
| Metadata element | Comments |
| Keyword | Controlled vocabulary utilising Library of Congress Authorities |
| Classification | Against (top two levels) of LCC |
How metadata is related to the functionality of the Open Search interface
The current functionality of http://repository.leedsmet.ac.uk/main/index.php is as follows:
| Function/field | Comments |
| Standard search | Comprises a simple search box that will search the entire metadata record – can manage simple Boolean operators (AND/OR) and perform a search to “Match exact text” (Boolean functionality needs reviewing) |
| Function/field | Comments |
| Advanced search (under development) | Comprises several metadata fields that can be cross referenced using simple Boolean logic (AND/OR – OR by default) |
| - Standard search | Should be self explanatory though early feedback from users suggests that it may not be! |
| - Title | Searches the “Title” metadata field only. |
| - ISBN/ISSN | In theory searches ISBN/ISSN – in practice searches all four description fields (see metadata Table 1 above) |
| - Author | In theory searches only the “Author” metadata (not yet implemented due to system bug – currently performing a standard search) |
| - Subject | Searches the “Subject” metadata field only. NB. Only one field – need to search multiple “Subject” fields. |
| - Type | Drop down list of vocab terms “Type of resource” (see metadata Table 1 above) |
| - Description/Abstract | In practice searches all four description fields (see metadata Table 1 above) |
| - DOI | Searches the “DOI” metadata field only |
| - Publisher | In theory searches only the “Publisher” metadata (not yet implemented due to system bug – currently performing a standard search) |
| - Format | Drop down list of all MIME types – in practice this will always be PDF for research – may be more appropriate for OER (see below) |
| Function | Comments |
| Browse | Currently two options to browse repository contents (research only) |
|
Currently no way of knowing how many resources are available at a given level – requires further technical development to display “number of resources” |
Open Educational Resources
I have so far been considering just research material but this, of course, is just one of the two main “types” of content that we need to manage with the Leeds Met repository.
Though infrastructure does not come within the remit of the UniCycle project, which is more focussed on the process around reuse of OER, an adequate search interface is something of a prerequisite; we have begun to integrate search functionality specific to OER into http://repository.leedsmet.ac.uk/main/index.php and are now able to differentiate between research and Open Educational Resources, for example, (using collection tokens), however, the metadata template for OER is quite different from research so ideally we will need a separate search form to reflect this. Metadata for OER is currently based on the “old” Jorum template (i.e. not JorumOpen) and comprises the fields in Table 1a below:
Table 1a
| Metadata element | Comments |
| Title | OER title |
| Description | Single instance only |
| Keyword | Uncontrolled/author produced keywords in line with ukoer guidelines (must include ukoer as keyword) |
| Contributor (Author) | System bug meant this could not be independently queried by SRU until recently – issue should now be resolved but not yet integrated with interface. |
| Contribution date | Automatically completed |
| Technical format | MIME media type – Comprising 70 technical formats – almost certainly don’t need them all for OER – are currently all exposed as list on advanced search form |
| Type of resource* | Terms from LOMv1.0: Diagram/ Exam/ Exercise/ Experiment/ Figure/ Graph/ Index/ Narrative Text/ Problem Statement/ Questionnaire/ Self Assessment/ Simulation/ Slide/ Table; Terms we have added to the vocabulary: Podcast/ Not Applicable/ Presentation/ Photograph/ Quiz/ Spreadsheet/ Tutorial/ Video/ Lecture/ Game/ Animation/ Assessment/ Audio/ Case Study/ Database/ Workbook
N.B. As with research vocabulary, it is relatively straightforward to add new terms/modify existing terms though there may be work involved modifying records |
| Statement of Copyright and Restriction | As we are using Creative Commons licensing for OER we can probably dispense with this field – I will need to contact Intrallect to remove it from the template |
| Categorisation | OER are currently categorised against JACS (Joint Academic Coding System) |
*The advanced search form currently only lists the research vocabulary in “Type” – this is an example of why we may need a separate form for advanced search of OER that displays the relevant vocabulary.
Display of OER
We also need to think about how OER records are displayed – currently this is a bit rough and ready, adapted from research results. For example full text/external resource may not be as relevant for OER and we may wish to display “Type of resource” here instead:
Browse for OER?
In order to facilitate browse by subject heading (JACS) in the Open Search interface it will be necessary to integrate this functionality into http://repository.leedsmet.ac.uk/main/index.php in a similar way as has been implemented for research (same caveat applies in that further technical development is required to show numbers of resources in the browse tree.)
I have a few thoughts/notes:
- Bibliographic ‘Source publication date’: Currently the interface attempts to search for appropriate date information from this field when using this data (e.g. the automatically generated references). A year, generally being a 4 digit number, is simple enough to pick out in most cases (probably all cases so far). Months/seasons can be picked out in text but not numerically as the can clash with days. The more specific the data in this field is, the better it is for us really though, as long as there is reasonable consistency in the way in which the data is entered.
- Publisher searching: This is actually working perfectly well as it was never switched to perform just a general search as was the case with the Author search.
- Technical format for research: Whether it will always be PDF for research I wouldn’t like to say 100%, but I suspect that the other formats that may occasionally be used would not need to be searched for explicitly (e.g. PostScript .ps format). They could be converted to PDF before upload in any case.
- Journal item/article terminology: Can this change be made in the back-end database on a one off basis by Intrallect?
- DOI search: Actually searches any LOM->identifier entries (this includes the unique identifier generated by intraLibrary for each record) and also searches the lom:technical->lom:location data (we tend to use this for the displayed ‘published/external URL’. Generally speaking this won’t actually cause a problem.
I think that’s all of it.
MT
Thanks Mike
I stand corrected on Publisher searching – as this is another instance of the contributor field set I assumed the same issue would arise as searching for Author. I’m not absolutely clear why this isn’t the case – is it just because it’s a less sophisticated requirement (i.e. Just one consistent string for Publisher whereas there may be multiple authors, often with common first names/surnames?
The difference is how the fields in LOM get mapped onto field in Dublin Core (which is what the searches are really querying). Publishers are considered separate from creators (authors) and are under different fields in DC – dc:publisher and dc:creator respectively.
I never altered the interface to use the general query instead of querying dc:publisher, it fixed itself when intraLibrary was updated.
Technical format for research: I do anticipate using PDF only and converting any other formats we might receive – fairly typical for OA archives of research I think though some also support Word. This shouldn’t be a problem while we are fully mediated though I guess there may be issues if we ever go to author self-archiving. Assuming research is PDF only then we clearly don’t need 70 MIME types supported in an advanced search for research – or even a fraction of them. Instead, could we perhaps just have a checkbox for “Full text”?
Just to add also that it may be worth looking at the advanced search functionality offered by JorumOpen to inform our own advanced search. JorumOpen will go live in January (19th I think) but you can see a video “Searching & Browsing JorumOpen” at http://community.jorum.ac.uk/course/view.php?id=40