SMETE Search API
1. Overview
1.1 Search Requests
2. Search Request Format
2.1 Search Parameters
3. Search Results Format
3.1 Search Response
3.1 Search Response Error Codes
4. Comments
4.1 Comments and Remarks
1. Overview
The SMETE Search API is a set of Java classes and WSDL (Web Services Description Language) files describing the SMETE federated search service. The API is a SOAP (Simple Object Access Protocol) interface to searching the learning resources cataloged at the SMETE digital library. Eventually, we will make available here:
- A complete API reference describing the semantics of method calls and fields
- Sample SOAP request and response messages
- SMETE API WSDL file
- A Java servlet library, example program, and Javadoc documentation
- A simple SOAP::Lite-based Perl script
This version of the SMETE Search API is modeled after the Google Web API. The objective of this specification is to enable client builders to create clients quickly that can search both large-scale repositories such as Google and targeted educational collections such as the SMETE Open Federation digital library with minimial modification to the client code.
The SmeteSearch.wsdl file is available for review.
1.1 Search Requests
Search requests submit a "query", which contains search parameters, and an embedded IEEE/IMS Learning Object Metadata "record" that contains the description of the actual query. The idea is that search results are derived by finding the closest matching learning resources in the SMETE digital library repository of learning resources. The search can either be "boolean" in which case the search engine does a boolean "AND" of each element in the metadata to the target metadata or "fuzzy" in which case it performs a closest metadata similarity.
The details of the interactions involved with search requests are covered in the Search Request Format and Search Results Format sections of this document.
2. Search Request Format
2.1 Search Request Parameters
The search request "doSmeteSearch" is contained within a SOAP envelope. The SOAP body then contains metadata about the search and metadata about the requested search results. Much of the syntax is borrowed from the Open Archives Protocol for Metadata Harvesting (OAI PMH) to simplify the creation of tools.
Before going into the details of the fields, here is a sample SOAP envelope.
<?xml version='1.0' encoding='UTF-8'?>
<SOAP-ENV:Envelope
xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:xsi="http://www.w3.org/1999/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/1999/XMLSchema">
<SOAP-ENV:Body>
<ns1:doSmeteSearch xmlns:ns1="urn:SMETESearch"
SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">
<key>xxxxxxxxxxxxxxxxxxxx</key>
<q>
<lom
xmlns:ims_lom="http://www.imsglobal.org/xsd/imsmd_v1p2"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.imsglobal.org/xsd/imsmd_v1p2
http://www.imsglobal.org/xsd/imsmd_v1p2p2.xsd">
</lom>
</q>
<start>1</start>
<maxResults>10</maxResults>
<start>1</start>
<language>en-us</language>
</ns1:doSmeteSearch>
</SOAP-ENV:Body>
</SOAP-ENV:Envelope>
|
Inside the query tag <q>, the client may do one of two possibilities.
One, the client can send a simple query string. We support the Lucene query syntax in which the field name, based on the IEEE Learning Object Metadata, should be specified as category.dataelementleaf. For example, the query string general.title:"disk drive" AND educational.learningresourcetype: simulation searches for learning objects with the title words "disk drive" (case insensitive) and of learning resource type "simulation." In this example, the LOM metadata has been removed for brevity. In its place would be a properly formed LOM metadata.
Second, the client can send a properly-formatted IEEE Learning Object metadata record encoded in XML. In this case, by adding the attribute boolean="xxx" to any leaf data element (except <langstring>), where the boolean operators AND, OR and NOT are supported, the search client can request the type of match for the fielded data. If no boolean attribute is specified, the boolean "OR" is assumed in the cases for which more than one field is specified. The equivalent "LOM" search query to the prior example would be (non-pertinent fields not displayed):
*Currently LOM query and Lucene query are not supported.
<lom
xmlns:ims_lom="http://www.imsglobal.org/xsd/imsmd_v1p2"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.imsglobal.org/xsd/imsmd_v1p2
http://www.imsglobal.org/xsd/imsmd_v1p2p2.xsd">
<general>
<title boolean="AND">
<langstring>disk drive</langstring>
</title>
</general>
<educational>
<learningresourcetype boolean="AND">
<langstring>simulation</langstring>
</learningresourcetype>
</educational>
</lom>
Element |
Definition |
| key | This is a key that may be used in the future for digital rights management. The key is a string and is currently un-enforced. |
| q | This is the query itself. The query is either a string conforming to the Lucene query syntax or an IEEE LOM record in XML format. |
| start | The "start" field is the zero-based index of the first desired result. |
| maxResults | The "maxResults" field indicates the number of results to return per response. If the attribute of the tag is set to "nill", e.g., <messageSize nill="true";></messageSize>, the service will return all records in a single response. Currently the largest maxResults number allowed is 20. Any number above 20 will be re-adjusted to 20. |
| language | The "language" field is the language of the records in the response. At this time, only English is supported as only English-language resources are currently available. In the future, it is intended to support responding with the metadata in other languages, such as Chinese (zh-pr). The language identification should follow the XML Language Identification specification. |
3. Search Results Format
3.1 Search Response Parameters
The search response from "doSmeteSearch" is nearly syntactically identical to the OAI PMH "ListRecords" response type. The main difference is that there are no "responsedate" and "request" fields. Instead, the preamble contains the fields "results" and "errorCode" and the per-record <record> contains a measure of match to the query, from 1 to 100, as an XML attribute "score". An example of a search response is shown below, again with the actual LOM record removed for brevity.
<?xml version='1.0' encoding='UTF-8'?>
<ListRecords>
<error />
<startIndex>1</startIndex>
<endIndex>2</endIndex>
<totalResultsCount>2</totalResultsCount>
<record>
<score>75</score>
<metadata>
<lom
xmlns:ims_lom="http://www.imsglobal.org/xsd/imsmd_v1p2"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.imsglobal.org/xsd/imsmd_v1p2
http://www.imsglobal.org/xsd/imsmd_v1p2p2.xsd">
</lom>
</metadata>
</record>
<record>
<score>75</score>
<metadata>
<lom
xmlns:ims_lom="http://www.imsglobal.org/xsd/imsmd_v1p2"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.imsglobal.org/xsd/imsmd_v1p2
http://www.imsglobal.org/xsd/imsmd_v1p2p2.xsd">
</lom>
</metadata>
</record>
</ListRecords>
|
3.2 Search Response Error Codes
If the search request produced an error, the "doSmeteSearch" service will respond with an error code in the header. These error codes are identical to the OAI PMH Version 2 error codes with the addition of the "notAllowed" error code. At this moment, this error code will not be produced; the intent is to provide an eventual means to implement a digital rights system which will limit searches.
4. Comments
4.1 Comments and Remarks
We developed this SOAP-based SMETE Search API after a trial period of a federated search using simple HTTP GET/POST name-value pairs. What we learned through this process was the following:
- Most digital libraries have already expended considerable resources to develop parsers for writing and reading certain metadata element sets, including Dublin Core and Learning Object Metadata. They would like to leverage that investment.
- There is still no "standard" for a query language that has been widely adopted by digital libraries. Essentially, only the query and query-type standards for Z39.50 have been widely adopted by libraries, but primarily only for bibliographic retrievel. The Simple Digital Library Interoperability Protocol (SDLIP) recommended the use of the IETF DAV Searching and Locating protocol for encoding search requests. However, the IETF closed the working group due to lack of progress.
- There is still no widely-adopted standard for federated search. Most are ad-hoc. There has been some research in specification methods for describing the federated search syntax of heterogeneous federated search services, e.g. Global Federated Searcher. Nonetheless, each search service may specify its own proprietary method.
- We investigated various XML-based query languages, including most notably XQuery which is based on XPath. However, we questioned whether digital libraries would devote the resources required to implement XQuery (including the associated human resources to "get up to speed" on its use), especially for a query language that is not yet adopted as a standard and is still a moving target.
For these reasons, we opted to adopt SOAP as the communication protocol, WSDL as the mechanism to describe the search service, and existing metadata element sets and syntax for encoding the search query and the search results. Together, these form the SMETE Search API. There are already numerous open source software tools for implementing SOAP and WSDL and they appear to become official W3C standards. The popular search engine Google has proposed its own Google API for searching the Google repository using SOAP and WSDL. Thus, if digital libraries implement a SOAP client to search the SMETE collection, they will also be able to use the same code base to search the Google repository for other Web-based learning resources. Microsoft's .NET initiative also supports SOAP. We believe that this is a good strategy for implementing federated search across heterogenous collections.
|