<chapter id="tutorial">
- <!-- $Id: tutorial.xml,v 1.5 2008-02-07 12:38:39 marc Exp $ -->
<title>Tutorial</title>
</para>
<para>
Additional OAI test records can be downloaded by running a shell
- script (you may want to abort the script when you have waitet
- longer than your coffe brews ..).
+ script (you may want to abort the script when you have waited
+ longer than your coffee brews ..).
<screen>
cd data
./fetch_OAI_data.sh
<para>
Searching and retrieving &acro.xml; records is easy. For example,
- you can point your browser to one of the following url's to
+ you can point your browser to one of the following URLs to
search for the term <literal>the</literal>. Just point your
browser at this link:
<ulink
<warning>
<para>
- These URL's woun't work unless you have indexed the example data
+ These URLs won't work unless you have indexed the example data
and started an &zebra; server as outlined in the previous section.
</para>
</warning>
<para>
In case we actually want to retrieve one record, we need to alter
- our URl to the following
+ our URL to the following
<ulink url="http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=dc">
http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=dc
</ulink>
<literal>conf/oai2dc.xsl</literal>, and
the <literal>zebra</literal> schema implemented in
<literal>conf/oai2zebra.xsl</literal>.
- The URL's for acessing both are the same, except for the different
+ The URLs for accessing both are the same, except for the different
value of the <literal>recordSchema</literal> parameter:
<ulink url="http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=dc">
http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=dc
The &acro.oai; indexing example defines many different index
names, a study of the <literal>conf/oai2index.xsl</literal>
stylesheet reveals the following word type indexes (i.e. those
- swith suffix <literal>:w</literal>):
+ with suffix <literal>:w</literal>):
<screen>
any:w
- dc_title:w
- dc_creator:w
- dc_subject:w
- dc_description:w
- dc_contributor:w
- dc_publisher:w
- dc_language:w
- dc_rights:w
+ title:w
+ author:w
+ subject:w
+ description:w
+ contributor:w
+ publisher:w
+ language:w
+ rights:w
</screen>
- By default, searches do access the <literal>anr:w</literal> index,
+ By default, searches do access the <literal>any:w</literal> index,
but we can direct searches to any access point by constructing the
correct &acro.pqf; query. For example, to search in titles only,
we use
<ulink
url="http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=@attr
- 1=dc_title the&startRecord=1&maximumRecords=1&recordSchema=dc">
+ 1=title the&startRecord=1&maximumRecords=1&recordSchema=dc">
http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=@attr
- 1=dc_title the&startRecord=1&maximumRecords=1&recordSchema=dc
+ 1=title the&startRecord=1&maximumRecords=1&recordSchema=dc
</ulink>
</para>
Similar we can direct searches to the other indexes defined. Or we
can create boolean combinations of searches on different
indexes. In this case we search for <literal>the</literal> in
- <literal>dc_title</literal> and for <literal>fish</literal> in
- <literal>dc_description</literal> using the query
- <literal>@and @attr 1=dc_title the @attr 1=dc_description fish</literal>.
+ <literal>title</literal> and for <literal>fish</literal> in
+ <literal>description</literal> using the query
+ <literal>@and @attr 1=title the @attr 1=description fish</literal>.
<ulink
url="http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=@and
- @attr 1=dc_title the
- @attr 1=dc_description
+ @attr 1=title the
+ @attr 1=description
fish&startRecord=1&maximumRecords=1&recordSchema=dc">
http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=@and
- @attr 1=dc_title the
- @attr 1=dc_description fish&startRecord=1&maximumRecords=1&recordSchema=dc
+ @attr 1=title the
+ @attr 1=description fish&startRecord=1&maximumRecords=1&recordSchema=dc
</ulink>
</para>
</sect1>
- <sect1 id="tutorial-oai-sru-zebra-indexess">
+ <sect1 id="tutorial-oai-sru-zebra-indexes">
<title>Investigating the content of the indexes</title>
<para>
- How doess the magic work? What is inside the indexes? Why is a certain
- record foound by a search, and another not?. The answer is in the
- inverterd indexes. You can easily investigate them using the
+ How does the magic work? What is inside the indexes? Why is a certain
+ record found by a search, and another not?. The answer is in the
+ inverted indexes. You can easily investigate them using the
special &zebra; schema
<literal>zebra::index::fieldname</literal>. In this example you
- can see that the <literal>dc_title</literal> index has both word
+ can see that the <literal>title</literal> index has both word
(type <literal>:w</literal>) and phrase (type
<literal>:p</literal>)
indexed fields,
- <ulink url="http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=zebra::index::dc_title">
- http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=zebra::index::dc_title
+ <ulink url="http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=zebra::index::title">
+ http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=zebra::index::title
</ulink>
</para>
<para>
But where in the indexes did the term match for the query occur?
Easily answered with the special &zebra; schema
- <literal>zebra::snippet</literal>. The matching terma are
+ <literal>zebra::snippet</literal>. The matching terms are
encapsulated by <literal><s></literal> tags.
<ulink url="http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=zebra::snippet">
http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=zebra::snippet
found inside my hit set? Try the special &zebra; schema
<literal>zebra::facet::fieldname:type</literal>. In this case, we
investigate additional search terms for the
- <literal>dc_title:w</literal> index.
- <ulink url="http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=zebra::facet::dc_title:w">
- http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=zebra::facet::dc_title:w
+ <literal>title:w</literal> index.
+ <ulink url="http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=zebra::facet::title:w">
+ http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=zebra::facet::title:w
</ulink>
</para>
One can ask for multiple facets. Here, we want them from phrase
indexes of type
<literal>:p</literal>.
- <ulink url="http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=zebra::facet::dc_publisher:p,dc_title:p">
- http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=zebra::facet::dc_publisher:p,dc_title:p
+ <ulink url="http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=zebra::facet::publisher:p,title:p">
+ http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=zebra::facet::publisher:p,title:p
</ulink>
</para>
<para>
The &acro.sru; specification mandates that the &acro.cql; query
language is supported and properly configure. Also, the server
- needs to be able to emmit a proper &acro.explain; &acro.xml;
+ needs to be able to emit a proper &acro.explain; &acro.xml;
record, which is used to determine the capabilities of the
specific server instance.
</para>
<para>
- In this example configuration we expoit the similarities between
+ In this example configuration we exploit the similarities between
the &acro.explain; record and the &acro.cql; query language
configuration, we generate the later from the former using an
&acro.xslt; transformation.
</para>
<para>
- The we are all set to start the &acro.sru;/acro.z3950; server including
+ We are all set to start the &acro.sru;/acro.z3950; server including
&acro.pqf; and &acro.cql; query configuration. It uses the &yaz; frontend
server configuration - just type
<screen>
url="http://localhost:9999/?version=1.1&operation=scan&scanClause=dc.identifier=fish">
http://localhost:9999/?version=1.1&operation=scan&scanClause=dc.identifier=fish
</ulink>
- accesses the indexed indentifiers.
+ accesses the indexed identifiers.
</para>
<para>
- In addition, all &zebra; internal special elemen sets or record
+ In addition, all &zebra; internal special element sets or record
schema's of the form
<literal>zebra::</literal> just work right out of the box
<ulink
Z> elements zebra::facet::any:w
Z> show 1+1
- Z> elements zebra::facet::dc_publisher:p,dc_title:p
+ Z> elements zebra::facet::publisher:p,title:p
Z> show 1+1
</screen>
</para>
Z> find @attr 1=oai_setspec @attr 4=3 7374617475733D756E707562
Z> show 1+1
- Z> find @attr 1=dc_title communication
+ Z> find @attr 1=title communication
Z> show 1+1
- Z> find @attr 1=dc_identifier @attr 4=3
+ Z> find @attr 1=identifier @attr 4=3
http://resolver.caltech.edu/CaltechCSTR:1986.5228-tr-86
Z> show 1+1
</screen>
Z> scan @attr 1=oai_datestamp @attr 4=3 1
Z> scan @attr 1=oai_setspec @attr 4=3 2000
Z>
- Z> scan @attr 1=dc_title communication
- Z> scan @attr 1=dc_identifier @attr 4=3 a
+ Z> scan @attr 1=title communication
+ Z> scan @attr 1=identifier @attr 4=3 a
</screen>
</para>
<para>
Notice that searching and scan on indexes
- <literal>dc_contributor</literal>, <literal>dc_language</literal>,
- <literal>dc_rights</literal>, and <literal>dc_source</literal>
+ <literal>contributor</literal>, <literal>language</literal>,
+ <literal>rights</literal>, and <literal>source</literal>
might fail, simply because none of the records in the small example set
have these fields set, and consequently, these indexes might not
been created.