<chapter id="server">
- <!-- $Id: server.xml,v 1.14 2006-02-16 13:27:18 mike Exp $ -->
+ <!-- $Id: server.xml,v 1.25 2006-06-30 14:36:12 marc Exp $ -->
<title>The Z39.50 Server</title>
<sect1 id="zebrasrv">
zebrasrv manpage -->
- <sect2><title>DESCRIPTION</title>
+ <sect2><title>Description</title>
<para>Zebra is a high-performance, general-purpose structured text indexing
and retrieval engine. It reads structured records in a variety of input
formats (eg. email, XML, MARC) and allows access to them through exact
</sect2>
<sect2>
- <title>SYNOPSIS</title>
+ <title>Synopsis</title>
&zebrasrv-synopsis;
</sect2>
<sect2>
- <title>OPTIONS</title>
+ <title>Options</title>
<para>
The options for <command>zebrasrv</command> are the same
&zebrasrv-options;
</sect2>
- <sect2 id="gfs-config"><title>VIRTUAL HOSTS</title>
- <para>
- <command>zebrasrv</command> uses the YAZ server frontend and does
- support multiple virtual servers behind multiple listening sockets.
- </para>
- &zebrasrv-virtual;
- </sect2>
- <sect2><title>FILES</title>
+
+ <sect2><title>Files</title>
<para>
<filename>zebra.cfg</filename>
</para>
</sect2>
- <sect2><title>SEE ALSO</title>
+ <sect2><title>See Also</title>
<para>
<citerefentry>
<refentrytitle>zebraidx</refentrytitle>
</citerefentry>
</para>
<para>
- Section "The Z39.50 Server" in the Zebra manual.
- <filename>http://www.indexdata.dk/zebra/doc/server.tkl</filename>
- </para>
- <para>
- Section "Virtual Hosts" in the YAZ manual.
- <filename>http://www.indexdata.dk/yaz/doc/server.vhosts.tkl</filename>
- </para>
- <para>
- Section "Specification of <ulink url="http://www.loc.gov/standards/sru/cql/">CQL</ulink> to RPN mappings" in the YAZ manual.
- <filename>http://www.indexdata.dk/yaz/doc/tools.tkl#tools.cql.map</filename>
- </para>
- <para>
The Zebra software is Copyright <command>Index Data</command>
<filename>http://www.indexdata.dk</filename>
and distributed under the
also the following section).
</para>
- <para>
- <emphasis>Use</emphasis> attributes are interpreted according to the
- attribute sets which have been loaded in the
- <literal>zebra.cfg</literal> file, and are matched against specific
- fields as specified in the <literal>.abs</literal> file which
- describes the profile of the records which have been loaded.
- If no Use attribute is provided, a default of Bib-1 Any is assumed.
- </para>
-
- <para>
- If a <emphasis>Structure</emphasis> attribute of
- <emphasis>Phrase</emphasis> is used in conjunction with a
- <emphasis>Completeness</emphasis> attribute of
- <emphasis>Complete (Sub)field</emphasis>, the term is matched
- against the contents of the phrase (long word) register, if one
- exists for the given <emphasis>Use</emphasis> attribute.
- A phrase register is created for those fields in the
- <literal>.abs</literal> file that contains a
- <literal>p</literal>-specifier.
- <!-- ### whatever the hell _that_ is -->
- </para>
-
- <para>
- If <emphasis>Structure</emphasis>=<emphasis>Phrase</emphasis> is
- used in conjunction with <emphasis>Incomplete Field</emphasis> - the
- default value for <emphasis>Completeness</emphasis>, the
- search is directed against the normal word registers, but if the term
- contains multiple words, the term will only match if all of the words
- are found immediately adjacent, and in the given order.
- The word search is performed on those fields that are indexed as
- type <literal>w</literal> in the <literal>.abs</literal> file.
- </para>
-
- <para>
- If the <emphasis>Structure</emphasis> attribute is
- <emphasis>Word List</emphasis>,
- <emphasis>Free-form Text</emphasis>, or
- <emphasis>Document Text</emphasis>, the term is treated as a
- natural-language, relevance-ranked query.
- This search type uses the word register, i.e. those fields
- that are indexed as type <literal>w</literal> in the
- <literal>.abs</literal> file.
- </para>
-
- <para>
- If the <emphasis>Structure</emphasis> attribute is
- <emphasis>Numeric String</emphasis> the term is treated as an integer.
- The search is performed on those fields that are indexed
- as type <literal>n</literal> in the <literal>.abs</literal> file.
- </para>
-
- <para>
- If the <emphasis>Structure</emphasis> attribute is
- <emphasis>URx</emphasis> the term is treated as a URX (URL) entity.
- The search is performed on those fields that are indexed as type
- <literal>u</literal> in the <literal>.abs</literal> file.
- </para>
-
- <para>
- If the <emphasis>Structure</emphasis> attribute is
- <emphasis>Local Number</emphasis> the term is treated as
- native Zebra Record Identifier.
- </para>
-
- <para>
- If the <emphasis>Relation</emphasis> attribute is
- <emphasis>Equals</emphasis> (default), the term is matched
- in a normal fashion (modulo truncation and processing of
- individual words, if required).
- If <emphasis>Relation</emphasis> is <emphasis>Less Than</emphasis>,
- <emphasis>Less Than or Equal</emphasis>,
- <emphasis>Greater than</emphasis>, or <emphasis>Greater than or
- Equal</emphasis>, the term is assumed to be numerical, and a
- standard regular expression is constructed to match the given
- expression.
- If <emphasis>Relation</emphasis> is <emphasis>Relevance</emphasis>,
- the standard natural-language query processor is invoked.
- </para>
-
- <para>
- For the <emphasis>Truncation</emphasis> attribute,
- <emphasis>No Truncation</emphasis> is the default.
- <emphasis>Left Truncation</emphasis> is not supported.
- <emphasis>Process # in search term</emphasis> is supported, as is
- <emphasis>Regxp-1</emphasis>.
- <emphasis>Regxp-2</emphasis> enables the fault-tolerant (fuzzy)
- search. As a default, a single error (deletion, insertion,
- replacement) is accepted when terms are matched against the register
- contents.
- </para>
-
- <sect3>
- <title>Regular expressions</title>
-
- <para>
- Each term in a query is interpreted as a regular expression if
- the truncation value is either <emphasis>Regxp-1</emphasis> (102)
- or <emphasis>Regxp-2</emphasis> (103).
- Both query types follow the same syntax with the operands:
- <variablelist>
-
- <varlistentry>
- <term>x</term>
- <listitem>
- <para>
- Matches the character <emphasis>x</emphasis>.
- </para>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term>.</term>
- <listitem>
- <para>
- Matches any character.
- </para>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term><literal>[</literal>..<literal>]</literal></term>
- <listitem>
- <para>
- Matches the set of characters specified;
- such as <literal>[abc]</literal> or <literal>[a-c]</literal>.
- </para>
- </listitem>
- </varlistentry>
- </variablelist>
- and the operators:
- <variablelist>
-
- <varlistentry>
- <term>x*</term>
- <listitem>
- <para>
- Matches <emphasis>x</emphasis> zero or more times. Priority: high.
- </para>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term>x+</term>
- <listitem>
- <para>
- Matches <emphasis>x</emphasis> one or more times. Priority: high.
- </para>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term>x?</term>
- <listitem>
- <para>
- Matches <emphasis>x</emphasis> zero or once. Priority: high.
- </para>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term>xy</term>
- <listitem>
- <para>
- Matches <emphasis>x</emphasis>, then <emphasis>y</emphasis>.
- Priority: medium.
- </para>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term>x|y</term>
- <listitem>
- <para>
- Matches either <emphasis>x</emphasis> or <emphasis>y</emphasis>.
- Priority: low.
- </para>
- </listitem>
- </varlistentry>
- </variablelist>
- The order of evaluation may be changed by using parentheses.
- </para>
-
- <para>
- If the first character of the <emphasis>Regxp-2</emphasis> query
- is a plus character (<literal>+</literal>) it marks the
- beginning of a section with non-standard specifiers.
- The next plus character marks the end of the section.
- Currently Zebra only supports one specifier, the error tolerance,
- which consists one digit.
- </para>
-
- <para>
- Since the plus operator is normally a suffix operator the addition to
- the query syntax doesn't violate the syntax for standard regular
- expressions.
- </para>
-
- </sect3>
-
- <sect3>
- <title>Query examples</title>
-
- <para>
- Phrase search for <emphasis>information retrieval</emphasis> in
- the title-register:
- <screen>
- @attr 1=4 "information retrieval"
- </screen>
- </para>
-
- <para>
- Ranked search for the same thing:
- <screen>
- @attr 1=4 @attr 2=102 "Information retrieval"
- </screen>
- </para>
-
- <para>
- Phrase search with a regular expression:
- <screen>
- @attr 1=4 @attr 5=102 "informat.* retrieval"
- </screen>
- </para>
-
- <para>
- Ranked search with a regular expression:
- <screen>
- @attr 1=4 @attr 5=102 @attr 2=102 "informat.* retrieval"
- </screen>
- </para>
-
- <para>
- In the GILS schema (<literal>gils.abs</literal>), the
- west-bounding-coordinate is indexed as type <literal>n</literal>,
- and is therefore searched by specifying
- <emphasis>structure</emphasis>=<emphasis>Numeric String</emphasis>.
- To match all those records with west-bounding-coordinate greater
- than -114 we use the following query:
- <screen>
- @attr 4=109 @attr 2=5 @attr gils 1=2038 -114
- </screen>
- </para>
- </sect3>
- </sect2>
-
+ </sect2>
+
<sect2>
<title>Present</title>
<para>
timeout.
</para>
</sect2>
+
+ <sect2>
+ <title>Explain</title>
+ <para>
+ Zebra maintains a "classic"
+ <ulink url="&url.z39.50.explain;">Explain</ulink> database
+ on the side.
+ This database is called <literal>IR-Explain-1</literal> and can be
+ searched using the attribute set <literal>exp-1</literal>.
+ </para>
+ <para>
+ The records in the explain database are of type
+ <literal>grs.sgml</literal>.
+ The root element for the Explain grs.sgml records is
+ <literal>explain</literal>, thus
+ <filename>explain.abs</filename> is used for indexing.
+ </para>
+ <note>
+ <para>
+ Zebra <emphasis>must</emphasis> be able to locate
+ <filename>explain.abs</filename> in order to index the Explain
+ records properly. Zebra will work without it but the information
+ will not be searchable.
+ </para>
+ </note>
+ </sect2>
</sect1>
</chapter>
<sect2>
<title>Scan</title>
<para>
- Zebra does <emphasis>not</emphasis> support SRU's
+ Zebra supports SRU's
<literal>scan</literal>
operation, as described at
- <ulink url="http://www.loc.gov/standards/sru/scan/"/>
+ <ulink url="http://www.loc.gov/standards/sru/scan/"/>.
+ Scanning using CQL syntax is the default, where the
+ standard <literal>scanClause</literal> parameter is used.
</para>
<para>
- This is a rather embarrassing surprise as the pieces are all
- there: Z39.50 scan is supported, and SRU scan requests are
- recognised and diagnosed. To add further to the embarrassment, a
- mutant form of SRU scan <emphasis>is</emphasis> supported, using
+ In addition, a
+ mutant form of SRU scan is supported, using
the non-standard <literal>x-pScanClause</literal> parameter in
place of the standard <literal>scanClause</literal> to scan on a
PQF query clause.
<title>Explain</title>
<para>
Zebra fully supports SRU's core
- <literal>searchRetrieve</literal>
+ <literal>explain</literal>
operation, as described at
<ulink url="http://www.loc.gov/standards/sru/explain/index.html"/>
</para>
<literal>operation</literal>=<literal>explain</literal>
and version-number specified)
or with a simple HTTP GET at the server's basename.
+ The ZeeRex record returned in response is the one embedded
+ in the YAZ Frontend Server configuration file that is described in the
+ <link linkend="gfs-config">Virtual Hosts</link> documentation.
</para>
+ <para>
+ Unfortunately, the data found in the
+ CQL-to-PQF text file must be added by hand-craft into the explain
+ section of the YAZ Frontend Server configuration file to be able
+ to provide a suitable explain record.
+ Too bad, but this is all extreme
+ new alpha stuff, and a lot of work has yet to be done ..
+ </para>
+ <para>
+ There is no linkeage whatsoever between the Z39.50 explain model
+ and the SRU/SRW explain response (well, at least not implemented
+ in Zebra, that is ..). Zebra does not provide a means using
+ Z39.50 to obtain the ZeeRex record.
+ </para>
+ </sect2>
+
+ <sect2>
+ <title>Some SRU Examples</title>
+ <para>
+ Surf into <literal>http://localhost:9999</literal>
+ to get an explain response, or use
+ <screen><![CDATA[
+ http://localhost:9999/?version=1.1&operation=explain
+ ]]></screen>
+ </para>
+ <para>
+ See number of hits for a query
+ <screen><![CDATA[
+ http://localhost:9999/?version=1.1&operation=searchRetrieve
+ &query=text=(plant%20and%20soil)
+ ]]></screen>
+ </para>
+ <para>
+ Fetch record 5-7 in Dublin Core format
+ <screen><![CDATA[
+ http://localhost:9999/?version=1.1&operation=searchRetrieve
+ &query=text=(plant%20and%20soil)
+ &startRecord=5&maximumRecords=2&recordSchema=dc
+ ]]></screen>
+ </para>
+ <para>
+ Even search using PQF queries using the <emphasis>extended naughty
+ verb</emphasis> <literal>x-pquery</literal>
+ <screen><![CDATA[
+ http://localhost:9999/?version=1.1&operation=searchRetrieve
+ &x-pquery=@attr%201=text%20@and%20plant%20soil
+ ]]></screen>
+ </para>
+ <para>
+ Or scan indexes using the <emphasis>extended extremely naughty
+ verb</emphasis> <literal>x-pScanClause</literal>
+ <screen><![CDATA[
+ http://localhost:9999/?version=1.1&operation=scan
+ &x-pScanClause=@attr%201=text%20something
+ ]]></screen>
+ <emphasis>Don't do this in production code!</emphasis>
+ But it's a great fast debugging aid.
+ </para>
</sect2>
<sect2>