--- /dev/null
+<chapter id="querymodel">
+ <!-- $Id: querymodel.xml,v 1.1 2006-06-13 09:27:01 marc Exp $ -->
+ <title>Query Model</title>
+
+ <sect1 id="querymodel-overview">
+ <title>Query Model Overview</title>
+
+ <para>
+ Zebra is born as a networking Information Retrieval engine adhering
+ to the international standards
+ <ulink url="http://www.loc.gov/z3950/agency/">Z39.50</ulink> and
+ <ulink url="http://www.loc.gov/standards/sru/">SRU</ulink>,
+ and implement the query model defined there.
+ Unfortunately, the Z39.50 query model has only defined a binary
+ encoded representation, which is used as transport packaging in
+ the Z39.50 protocol layer. This representation is not human
+ readable, nor defines any convenient way to specify queries.
+ </para>
+ <para>
+ Therefore, Index Data has defined a textual representaion in the
+ <literal>Prefix Query Format</literal>, short
+ <literal>PQF</literal>, which then has been adopted by other
+ parties developing Z39.50 software. It is also often referred to as
+ <literal>Prefix Query Notation</literal>, or in short
+ <literal>PQN</literal>, and is thoroughly explained in
+ <xref linkend="querymodel-pqf"/>.
+ </para>
+
+ <para>
+ In addition, Zebra can be configured to understand and map the
+ <literal>Common Query Language</literal>
+ (<ulink url="http://www.loc.gov/standards/sru/cql/">CQL</ulink>)
+ to PQF. See an introduction on the mapping to the internal query
+ representation in
+ <xref linkend="querymodel-cql-to-pqf"/>.
+ </para>
+ </sect1>
+
+ <sect1 id="querymodel-pqf">
+ <title>Prefix Query Format structure and syntax</title>
+ <para>
+ The
+ <ulink url="http://indexdata.dk/yaz/doc/tools.tkl#PQF">PQF
+ grammer</ulink> is documented in the YAZ manual.
+ This textual PQF representation
+ is always during search mapped to the equivalent Zebra internal
+ query parse tree.
+ </para>
+
+ <para>
+ </para>
+
+ <sect2 id="querymodel-exp1">
+ <title>Explain Attribute Set</title>
+ <para>
+ The attribute-set <literal>exp-1</literal> is defined for
+ searching an Explain <literal>IR-Explain-1</literal> database.
+ It consists of a single <literal>Use (type 1)</literal> attribute.
+ </para>
+ <para>
+ In addition, the non-Use
+ <literal>bib-1</literal> attributes, that is, the types
+ <literal>Relation</literal>, <literal>Position</literal>,
+ <literal>Structure</literal>, <literal>Truncation</literal>,
+ and <literal>Completeness</literal> are imported from
+ the <literal>bib-1</literal> attrubute set, and may be used
+ within any explain query.
+ </para>
+
+ <sect3 id="querymodel-exp1-use">
+ <title>Use Attributes (type = 1)</title>
+ <para>
+ The following Explain search atributes are supported:
+ <literal>ExplainCategory</literal> (@attr 1=1),
+ <literal>DatabaseName</literal> (@attr 1=3),
+ <literal>DateAdded</literal> (@attr 1=9),
+ <literal>DateChanged</literal>(@attr 1=10).
+ </para>
+ <para>
+ A search in the use attribute <literal>ExplainCategory</literal>
+ supports only these predefined values:
+ <literal>CategoryList</literal>, <literal>TargetInfo</literal>,
+ <literal>DatabaseInfo</literal>, <literal>AttributeDetails</literal>.
+ </para>
+ <para>
+ See <filename>tab/explain.att</filename> and the
+ for more information.
+ </para>
+ </sect3>
+
+ <sect3>
+ <title>Explain searches with yaz-client</title>
+ <para>
+ List supported categories to find out which explain commands are
+ supported:
+ <screen>
+ Z> base IR-Explain-1
+ Z> @attr exp1 1=1 categorylist
+ Z> form sutrs
+ Z> show 1+2
+ </screen>
+ </para>
+
+ <para>
+ Get target info, that is, investigate which databases exist at
+ this server endpoint:
+ <screen>
+ Z> base IR-Explain-1
+ Z> @attr exp1 1=1 targetinfo
+ Z> form xml
+ Z> show 1+1
+ Z> form grs-1
+ Z> show 1+1
+ Z> form sutrs
+ Z> show 1+1
+ </screen>
+ </para>
+
+ <para>
+ List all supported databases, the number of hits
+ is the number of databases found, which most commonly are the
+ following two:
+ the <literal>Default</literal> and the
+ <literal>IR-Explain-1</literal> databases.
+ <screen>
+ Z> base IR-Explain-1
+ Z> f @attr exp1 1=1 databaseinfo
+ Z> form sutrs
+ Z> show 1+2
+ </screen>
+ </para>
+
+ <para>
+ Get database info record for database <literal>Default</literal>.
+ <screen>
+ Z> base IR-Explain-1
+ Z> @and @attr exp1 1=1 databaseinfo @attr exp1 1=3 Default
+ </screen>
+ Identical query with explicitly specified attribute set:
+ <screen>
+ Z> base IR-Explain-1
+ Z> @attrset exp1 @and @attr 1=1 databaseinfo @attr 1=3 Default
+ </screen>
+ </para>
+
+ <para>
+ Get attribute details record for database
+ <literal>Default</literal>.
+ This query is very useful to study the internal Zebra indexes.
+ If records have been indexed using the <literal>alvis</literal>
+ XSLT filter, the string representation names of the known indexes can be
+ found.
+ <screen>
+ Z> base IR-Explain-1
+ Z> @and @attr exp1 1=1 attributedetails @attr exp1 1=3 Default
+ </screen>
+ Identical query with explicitly specified attribute set:
+ <screen>
+ Z> base IR-Explain-1
+ Z> @attrset exp1 @and @attr 1=1 attributedetails @attr 1=3 Default
+ </screen>
+ </para>
+ </sect3>
+
+ </sect2>
+
+ <sect2 id="querymodel-bib1">
+ <title>Bib1 Attribute Set</title>
+ <para>
+ Something about querying to be written ..
+ </para>
+ <para>
+ Most of the information contained in this section is an excerpt of
+ the <literal>ATTRIBUTE SET BIB-1 (Z39.50-1995)
+ SEMANTICS</literal>, found at <ulink
+ url="http://www.loc.gov/z3950/agency/bib1.html">The BIB-1
+ Attribute Set Semantics</ulink> from 1995, also in an updated
+ <ulink
+ url="http://www.loc.gov/z3950/agency/defns/bib1.html">Bib-1
+ Attribute Set</ulink>
+ version from 2003. Index Data is not the copyright holder of this
+ information.
+ </para>
+
+
+ <sect3 id="querymodel-bib1-use">
+ <title>Use Attributes (type = 1)</title>
+ </sect3>
+
+ <sect3 id="querymodel-bib1-relation">
+ <title>Relation Attributes (type = 2)</title>
+ </sect3>
+ <para>
+ </para>
+
+ <sect3 id="querymodel-bib1-position">
+ <title>Position Attributes (type = 3)</title>
+ </sect3>
+
+ <sect3 id="querymodel-bib1-structure">
+ <title>Structure Attributes (type = 4)</title>
+ </sect3>
+
+ <sect3 id="querymodel-bib1-truncation">
+ <title>Truncation Attributes (type = 5)</title>
+ </sect3>
+
+ <sect3 id="querymodel-bib1-completeness">
+ <title>Completeness Attributes (type = 6)</title>
+ </sect3>
+
+ <sect3 id="querymodel-bib1-sorting">
+ <title>Zebra Extention Sorting Attributes (type = 7)</title>
+ </sect3>
+
+ <sect3 id="querymodel-bib1-estimation">
+ <title>Zebra Extention Search Estimation Attributes (type = 8)</title>
+ </sect3>
+
+ <sect3 id="querymodel-bib1-weight">
+ <title>Zebra Extention Weight Attributes (type = 9)</title>
+ </sect3>
+
+ </sect2>
+
+ <sect2 id="querymodel-bib1-mapping">
+ <title>Mapping from Bib1 Attributes to Zebra internal
+ register indexes</title>
+ <para>
+ </para>
+
+ <para>
+ <emphasis>Use</emphasis> attributes are interpreted according to the
+ attribute sets which have been loaded in the
+ <literal>zebra.cfg</literal> file, and are matched against specific
+ fields as specified in the <literal>.abs</literal> file which
+ describes the profile of the records which have been loaded.
+ If no Use attribute is provided, a default of Bib-1 Any is assumed.
+ </para>
+
+ <para>
+ If a <emphasis>Structure</emphasis> attribute of
+ <emphasis>Phrase</emphasis> is used in conjunction with a
+ <emphasis>Completeness</emphasis> attribute of
+ <emphasis>Complete (Sub)field</emphasis>, the term is matched
+ against the contents of the phrase (long word) register, if one
+ exists for the given <emphasis>Use</emphasis> attribute.
+ A phrase register is created for those fields in the
+ <literal>.abs</literal> file that contains a
+ <literal>p</literal>-specifier.
+ <!-- ### whatever the hell _that_ is -->
+ </para>
+
+ <para>
+ If <emphasis>Structure</emphasis>=<emphasis>Phrase</emphasis> is
+ used in conjunction with <emphasis>Incomplete Field</emphasis> - the
+ default value for <emphasis>Completeness</emphasis>, the
+ search is directed against the normal word registers, but if the term
+ contains multiple words, the term will only match if all of the words
+ are found immediately adjacent, and in the given order.
+ The word search is performed on those fields that are indexed as
+ type <literal>w</literal> in the <literal>.abs</literal> file.
+ </para>
+
+ <para>
+ If the <emphasis>Structure</emphasis> attribute is
+ <emphasis>Word List</emphasis>,
+ <emphasis>Free-form Text</emphasis>, or
+ <emphasis>Document Text</emphasis>, the term is treated as a
+ natural-language, relevance-ranked query.
+ This search type uses the word register, i.e. those fields
+ that are indexed as type <literal>w</literal> in the
+ <literal>.abs</literal> file.
+ </para>
+
+ <para>
+ If the <emphasis>Structure</emphasis> attribute is
+ <emphasis>Numeric String</emphasis> the term is treated as an integer.
+ The search is performed on those fields that are indexed
+ as type <literal>n</literal> in the <literal>.abs</literal> file.
+ </para>
+
+ <para>
+ If the <emphasis>Structure</emphasis> attribute is
+ <emphasis>URx</emphasis> the term is treated as a URX (URL) entity.
+ The search is performed on those fields that are indexed as type
+ <literal>u</literal> in the <literal>.abs</literal> file.
+ </para>
+
+ <para>
+ If the <emphasis>Structure</emphasis> attribute is
+ <emphasis>Local Number</emphasis> the term is treated as
+ native Zebra Record Identifier.
+ </para>
+
+ <para>
+ If the <emphasis>Relation</emphasis> attribute is
+ <emphasis>Equals</emphasis> (default), the term is matched
+ in a normal fashion (modulo truncation and processing of
+ individual words, if required).
+ If <emphasis>Relation</emphasis> is <emphasis>Less Than</emphasis>,
+ <emphasis>Less Than or Equal</emphasis>,
+ <emphasis>Greater than</emphasis>, or <emphasis>Greater than or
+ Equal</emphasis>, the term is assumed to be numerical, and a
+ standard regular expression is constructed to match the given
+ expression.
+ If <emphasis>Relation</emphasis> is <emphasis>Relevance</emphasis>,
+ the standard natural-language query processor is invoked.
+ </para>
+
+ <para>
+ For the <emphasis>Truncation</emphasis> attribute,
+ <emphasis>No Truncation</emphasis> is the default.
+ <emphasis>Left Truncation</emphasis> is not supported.
+ <emphasis>Process # in search term</emphasis> is supported, as is
+ <emphasis>Regxp-1</emphasis>.
+ <emphasis>Regxp-2</emphasis> enables the fault-tolerant (fuzzy)
+ search. As a default, a single error (deletion, insertion,
+ replacement) is accepted when terms are matched against the register
+ contents.
+ </para>
+ </sect2>
+
+ <sect2 id="querymodel-regular">
+ <title>Regular expressions</title>
+
+ <para>
+ Each term in a query is interpreted as a regular expression if
+ the truncation value is either <emphasis>Regxp-1</emphasis> (102)
+ or <emphasis>Regxp-2</emphasis> (103).
+ Both query types follow the same syntax with the operands:
+ <variablelist>
+
+ <varlistentry>
+ <term>x</term>
+ <listitem>
+ <para>
+ Matches the character <emphasis>x</emphasis>.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>.</term>
+ <listitem>
+ <para>
+ Matches any character.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>[</literal>..<literal>]</literal></term>
+ <listitem>
+ <para>
+ Matches the set of characters specified;
+ such as <literal>[abc]</literal> or <literal>[a-c]</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ and the operators:
+ <variablelist>
+
+ <varlistentry>
+ <term>x*</term>
+ <listitem>
+ <para>
+ Matches <emphasis>x</emphasis> zero or more times. Priority: high.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>x+</term>
+ <listitem>
+ <para>
+ Matches <emphasis>x</emphasis> one or more times. Priority: high.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>x?</term>
+ <listitem>
+ <para>
+ Matches <emphasis>x</emphasis> zero or once. Priority: high.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>xy</term>
+ <listitem>
+ <para>
+ Matches <emphasis>x</emphasis>, then <emphasis>y</emphasis>.
+ Priority: medium.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>x|y</term>
+ <listitem>
+ <para>
+ Matches either <emphasis>x</emphasis> or <emphasis>y</emphasis>.
+ Priority: low.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ The order of evaluation may be changed by using parentheses.
+ </para>
+
+ <para>
+ If the first character of the <emphasis>Regxp-2</emphasis> query
+ is a plus character (<literal>+</literal>) it marks the
+ beginning of a section with non-standard specifiers.
+ The next plus character marks the end of the section.
+ Currently Zebra only supports one specifier, the error tolerance,
+ which consists one digit.
+ </para>
+
+ <para>
+ Since the plus operator is normally a suffix operator the addition to
+ the query syntax doesn't violate the syntax for standard regular
+ expressions.
+ </para>
+
+ </sect2>
+
+ <sect2 id="querymodel-examples">
+ <title>Query examples</title>
+
+ <para>
+ Phrase search for <emphasis>information retrieval</emphasis> in
+ the title-register:
+ <screen>
+ @attr 1=4 "information retrieval"
+ </screen>
+ </para>
+
+ <para>
+ Ranked search for the same thing:
+ <screen>
+ @attr 1=4 @attr 2=102 "Information retrieval"
+ </screen>
+ </para>
+
+ <para>
+ Phrase search with a regular expression:
+ <screen>
+ @attr 1=4 @attr 5=102 "informat.* retrieval"
+ </screen>
+ </para>
+
+ <para>
+ Ranked search with a regular expression:
+ <screen>
+ @attr 1=4 @attr 5=102 @attr 2=102 "informat.* retrieval"
+ </screen>
+ </para>
+
+ <para>
+ In the GILS schema (<literal>gils.abs</literal>), the
+ west-bounding-coordinate is indexed as type <literal>n</literal>,
+ and is therefore searched by specifying
+ <emphasis>structure</emphasis>=<emphasis>Numeric String</emphasis>.
+ To match all those records with west-bounding-coordinate greater
+ than -114 we use the following query:
+ <screen>
+ @attr 4=109 @attr 2=5 @attr gils 1=2038 -114
+ </screen>
+ </para>
+ </sect2>
+
+
+ <!-- see in util/zebramap.c
+ int zebra_maps_attr
+
+ if (completeness_value == 2 || completeness_value == 3)
+ *complete_flag = 1;
+ else
+ *complete_flag = 0;
+ *reg_id = 0;
+
+ *sort_flag =(sort_relation_value > 0) ? 1 : 0;
+ *search_type = "phrase";
+ strcpy(rank_type, "void");
+ if (relation_value == 102)
+ {
+ if (weight_value == -1)
+ weight_value = 34;
+ sprintf(rank_type, "rank,w=%d,u=%d", weight_value, use_value);
+ }
+ if (relation_value == 103)
+ {
+ *search_type = "always";
+ *reg_id = 'w';
+ return 0;
+ }
+ if (*complete_flag)
+ *reg_id = 'p';
+ else
+ *reg_id = 'w';
+ switch (structure_value)
+ {
+ case 6: /* word list */
+ *search_type = "and-list";
+ break;
+ case 105: /* free-form-text */
+ *search_type = "or-list";
+ break;
+ case 106: /* document-text */
+ *search_type = "or-list";
+ break;
+ case -1:
+ case 1: /* phrase */
+ case 2: /* word */
+ case 108: /* string */
+ *search_type = "phrase";
+ break;
+ case 107: /* local-number */
+ *search_type = "local";
+ *reg_id = 0;
+ break;
+ case 109: /* numeric string */
+ *reg_id = 'n';
+ *search_type = "numeric";
+ break;
+ case 104: /* urx */
+ *reg_id = 'u';
+ *search_type = "phrase";
+ break;
+ case 3: /* key */
+ *reg_id = '0';
+ *search_type = "phrase";
+ break;
+ case 4: /* year */
+ *reg_id = 'y';
+ *search_type = "phrase";
+ break;
+ case 5: /* date */
+ *reg_id = 'd';
+ *search_type = "phrase";
+ break;
+ default:
+ return -1;
+ }
+ return 0;
+
+ -->
+
+ <!--
+ <para>
+ The RecordType parameter in the <literal>zebra.cfg</literal> file, or
+ the <literal>-t</literal> option to the indexer tells Zebra how to
+ process input records.
+ Two basic types of processing are available - raw text and structured
+ data. Raw text is just that, and it is selected by providing the
+ argument <emphasis>text</emphasis> to Zebra. Structured records are
+ all handled internally using the basic mechanisms described in the
+ subsequent sections.
+ Zebra can read structured records in many different formats.
+ </para>
+ -->
+ </sect1>
+
+
+ <sect1 id="querymodel-cql-to-pqf">
+ <title>Server Side CQL to PQF Query Translation</title>
+ <para>
+ Using the
+ <literal><cql2rpn>l2rpn.txt</cql2rpn></literal>
+ YAZ Frontend Virtual
+ Hosts option, one can configure
+ the YAZ Frontend CQL-to-PQF
+ converter, specifying the interpretation of various
+ <ulink url="http://www.loc.gov/standards/sru/cql/">CQL</ulink>
+ indexes, relations, etc. in terms of Type-1 query attributes.
+ <!-- The yaz-client config file -->
+ </para>
+ <para>
+ For example, using server-side CQL-to-PQF conversion, one might
+ query a zebra server like this:
+ <screen>
+ <![CDATA[
+ yaz-client localhost:9999
+ Z> querytype cql
+ Z> find text=(plant and soil)
+ ]]>
+ </screen>
+ and - if properly configured - even static relevance ranking can
+ be performed using CQL query syntax:
+ <screen>
+ <![CDATA[
+ Z> find text = /relevant (plant and soil)
+ ]]>
+ </screen>
+ </para>
+
+ <para>
+ By the way, the same configuration can be used to
+ search using client-side CQL-to-PQF conversion:
+ (the only difference is <literal>querytype cql2rpn</literal>
+ instead of
+ <literal>querytype cql</literal>, and the call specifying a local
+ conversion file)
+ <screen>
+ <![CDATA[
+ yaz-client -q local/cql2pqf.txt localhost:9999
+ Z> querytype cql2rpn
+ Z> find text=(plant and soil)
+ ]]>
+ </screen>
+ </para>
+
+ <para>
+ Exhaustive information can be found in the
+ Section "Specification of CQL to RPN mappings" in the YAZ manual.
+ <ulink url="http://www.indexdata.dk/yaz/doc/tools.tkl#tools.cql.map">
+ http://www.indexdata.dk/yaz/doc/tools.tkl#tools.cql.map</ulink>,
+ and shall therefore not be repeated here.
+ </para>
+ <!--
+ <para>
+ See
+ <ulink url="http://www.loc.gov/z3950/agency/zing/cql/dc-indexes.html">
+ http://www.loc.gov/z3950/agency/zing/cql/dc-indexes.html</ulink>
+ for the Maintenance Agency's work-in-progress mapping of Dublin Core
+ indexes to Attribute Architecture (util, XD and BIB-2)
+ attributes.
+ </para>
+ -->
+ </sect1>
+
+
+
+<!--
+ <sect1 id="architecture-querylanguage">
+ <title>Query Languages</title>
+
+ <para>
+
+http://www.loc.gov/z3950/agency/document.html
+
+ PQF and BIB-1 stuff to be explained
+ <ulink url="http://www.loc.gov/z3950/agency/defns/bib1.html">
+ http://www.loc.gov/z3950/agency/defns/bib1.html</ulink>
+
+ <ulink url="http://www.loc.gov/z3950/agency/bib1.html">
+ http://www.loc.gov/z3950/agency/bib1.html</ulink>
+
+ http://www.loc.gov/z3950/agency/markup/13.html
+
+ </para>
+ </sect1>
+
+
+These attribute types are recognized regardless of attribute set. Some are recognized for search, others for scan.
+
+Search
+
+Type Name Version
+7 Embedded Sort 1.1
+8 Term Set 1.1
+9 Rank weight 1.1
+9 Approx Limit 1.4
+10 Term Ref 1.4
+
+Embedded Sort
+
+The embedded sort is a way to specify sort within a query - thus removing the need to send a Sort Request separately. It is both faster and does not require clients that deal with the Sort Facility.
+
+The value after attribute type 7 is 1=ascending, 2=descending.. The attributes+term (APT) node is separate from the rest and must be @or'ed. The term associated with APT is the level .. 0=primary sort, 1=secondary sort etc.. Example:
+
+Search for water, sort by title (ascending):
+
+ @or @attr 1=1016 water @attr 7=1 @attr 1=4 0
+
+Search for water, sort by title ascending, then date descending:
+
+ @or @or @attr 1=1016 water @attr 7=1 @attr 1=4 0 @attr 7=2 @attr 1=30 1
+
+Term Set
+
+The Term Set feature is a facility that allows a search to store hitting terms in a "pseudo" resultset; thus a search (as usual) + a scan-like facility. Requires a client that can do named result sets since the search generates two result sets. The value for attribute 8 is the name of a result set (string). The terms in term set are returned as SUTRS records.
+
+Seach for u in title, right truncated.. Store result in result set named uset.
+
+ @attr 5=1 @attr 1=4 @attr 8=uset u
+
+The model as one serious flaw.. We don't know the size of term set.
+
+Rank weight
+
+Rank weight is a way to pass a value to a ranking algorithm - so that one APT has one value - while another as a different one.
+
+Search for utah in title with weight 30 as well as any with weight 20.
+
+ @attr 2=102 @or @attr 9=30 @attr 1=4 utah @attr 9=20 utah
+
+Approx Limit
+
+Newer Zebra versions normally estemiates hit count for every APT (leaf) in the query tree. These hit counts are returned as part of the searchResult-1 facility.
+
+By setting a limit for the APT we can make Zebra turn into approximate hit count when a certain hit count limit is reached. A value of zero means exact hit count.
+
+We are intersted in exact hit count for a, but for b we allow estimates for 1000 and higher..
+
+ @and a @attr 9=1000 b
+
+This facility clashes with rank weight! Fortunately this is a Zebra 1.4 thing so we can change this without upsetting anybody!
+
+Term Ref
+
+Zebra supports the searchResult-1 facility.
+
+If attribute 10 is given, that specifies a subqueryId value returned as part of the search result. It is a way for a client to name an APT part of a query.
+
+Scan
+
+Type Name Version
+8 Result set narrow 1.3
+9 Approx Limit 1.4
+
+Result set narrow
+
+If attribute 8 is given for scan, the value is the name of a result set. Each hit count in scan is @and'ed with the result set given.
+
+Approx limit
+
+The approx (as for search) is a way to enable approx hit counts for scan hit counts. However, it does NOT appear to work at the moment.
+
+
+ AdamDickmeiss - 19 Dec 2005
+
+
+-->
+
+</chapter>
+
+ <!-- Keep this comment at the end of the file
+ Local variables:
+ mode: sgml
+ sgml-omittag:t
+ sgml-shorttag:t
+ sgml-minimize-attributes:nil
+ sgml-always-quote-attributes:t
+ sgml-indent-step:1
+ sgml-indent-data:t
+ sgml-parent-document: "zebra.xml"
+ sgml-local-catalogs: nil
+ sgml-namecase-general:t
+ End:
+ -->
<chapter id="server">
- <!-- $Id: server.xml,v 1.22 2006-06-07 13:17:48 marc Exp $ -->
+ <!-- $Id: server.xml,v 1.23 2006-06-13 09:27:01 marc Exp $ -->
<title>The Z39.50 Server</title>
<sect1 id="zebrasrv">
also the following section).
</para>
- <para>
- <emphasis>Use</emphasis> attributes are interpreted according to the
- attribute sets which have been loaded in the
- <literal>zebra.cfg</literal> file, and are matched against specific
- fields as specified in the <literal>.abs</literal> file which
- describes the profile of the records which have been loaded.
- If no Use attribute is provided, a default of Bib-1 Any is assumed.
- </para>
-
- <para>
- If a <emphasis>Structure</emphasis> attribute of
- <emphasis>Phrase</emphasis> is used in conjunction with a
- <emphasis>Completeness</emphasis> attribute of
- <emphasis>Complete (Sub)field</emphasis>, the term is matched
- against the contents of the phrase (long word) register, if one
- exists for the given <emphasis>Use</emphasis> attribute.
- A phrase register is created for those fields in the
- <literal>.abs</literal> file that contains a
- <literal>p</literal>-specifier.
- <!-- ### whatever the hell _that_ is -->
- </para>
-
- <para>
- If <emphasis>Structure</emphasis>=<emphasis>Phrase</emphasis> is
- used in conjunction with <emphasis>Incomplete Field</emphasis> - the
- default value for <emphasis>Completeness</emphasis>, the
- search is directed against the normal word registers, but if the term
- contains multiple words, the term will only match if all of the words
- are found immediately adjacent, and in the given order.
- The word search is performed on those fields that are indexed as
- type <literal>w</literal> in the <literal>.abs</literal> file.
- </para>
-
- <para>
- If the <emphasis>Structure</emphasis> attribute is
- <emphasis>Word List</emphasis>,
- <emphasis>Free-form Text</emphasis>, or
- <emphasis>Document Text</emphasis>, the term is treated as a
- natural-language, relevance-ranked query.
- This search type uses the word register, i.e. those fields
- that are indexed as type <literal>w</literal> in the
- <literal>.abs</literal> file.
- </para>
-
- <para>
- If the <emphasis>Structure</emphasis> attribute is
- <emphasis>Numeric String</emphasis> the term is treated as an integer.
- The search is performed on those fields that are indexed
- as type <literal>n</literal> in the <literal>.abs</literal> file.
- </para>
-
- <para>
- If the <emphasis>Structure</emphasis> attribute is
- <emphasis>URx</emphasis> the term is treated as a URX (URL) entity.
- The search is performed on those fields that are indexed as type
- <literal>u</literal> in the <literal>.abs</literal> file.
- </para>
-
- <para>
- If the <emphasis>Structure</emphasis> attribute is
- <emphasis>Local Number</emphasis> the term is treated as
- native Zebra Record Identifier.
- </para>
-
- <para>
- If the <emphasis>Relation</emphasis> attribute is
- <emphasis>Equals</emphasis> (default), the term is matched
- in a normal fashion (modulo truncation and processing of
- individual words, if required).
- If <emphasis>Relation</emphasis> is <emphasis>Less Than</emphasis>,
- <emphasis>Less Than or Equal</emphasis>,
- <emphasis>Greater than</emphasis>, or <emphasis>Greater than or
- Equal</emphasis>, the term is assumed to be numerical, and a
- standard regular expression is constructed to match the given
- expression.
- If <emphasis>Relation</emphasis> is <emphasis>Relevance</emphasis>,
- the standard natural-language query processor is invoked.
- </para>
-
- <para>
- For the <emphasis>Truncation</emphasis> attribute,
- <emphasis>No Truncation</emphasis> is the default.
- <emphasis>Left Truncation</emphasis> is not supported.
- <emphasis>Process # in search term</emphasis> is supported, as is
- <emphasis>Regxp-1</emphasis>.
- <emphasis>Regxp-2</emphasis> enables the fault-tolerant (fuzzy)
- search. As a default, a single error (deletion, insertion,
- replacement) is accepted when terms are matched against the register
- contents.
- </para>
-
- <sect3>
- <title>Regular expressions</title>
-
- <para>
- Each term in a query is interpreted as a regular expression if
- the truncation value is either <emphasis>Regxp-1</emphasis> (102)
- or <emphasis>Regxp-2</emphasis> (103).
- Both query types follow the same syntax with the operands:
- <variablelist>
-
- <varlistentry>
- <term>x</term>
- <listitem>
- <para>
- Matches the character <emphasis>x</emphasis>.
- </para>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term>.</term>
- <listitem>
- <para>
- Matches any character.
- </para>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term><literal>[</literal>..<literal>]</literal></term>
- <listitem>
- <para>
- Matches the set of characters specified;
- such as <literal>[abc]</literal> or <literal>[a-c]</literal>.
- </para>
- </listitem>
- </varlistentry>
- </variablelist>
- and the operators:
- <variablelist>
-
- <varlistentry>
- <term>x*</term>
- <listitem>
- <para>
- Matches <emphasis>x</emphasis> zero or more times. Priority: high.
- </para>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term>x+</term>
- <listitem>
- <para>
- Matches <emphasis>x</emphasis> one or more times. Priority: high.
- </para>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term>x?</term>
- <listitem>
- <para>
- Matches <emphasis>x</emphasis> zero or once. Priority: high.
- </para>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term>xy</term>
- <listitem>
- <para>
- Matches <emphasis>x</emphasis>, then <emphasis>y</emphasis>.
- Priority: medium.
- </para>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term>x|y</term>
- <listitem>
- <para>
- Matches either <emphasis>x</emphasis> or <emphasis>y</emphasis>.
- Priority: low.
- </para>
- </listitem>
- </varlistentry>
- </variablelist>
- The order of evaluation may be changed by using parentheses.
- </para>
-
- <para>
- If the first character of the <emphasis>Regxp-2</emphasis> query
- is a plus character (<literal>+</literal>) it marks the
- beginning of a section with non-standard specifiers.
- The next plus character marks the end of the section.
- Currently Zebra only supports one specifier, the error tolerance,
- which consists one digit.
- </para>
-
- <para>
- Since the plus operator is normally a suffix operator the addition to
- the query syntax doesn't violate the syntax for standard regular
- expressions.
- </para>
-
- </sect3>
-
- <sect3>
- <title>Query examples</title>
-
- <para>
- Phrase search for <emphasis>information retrieval</emphasis> in
- the title-register:
- <screen>
- @attr 1=4 "information retrieval"
- </screen>
- </para>
-
- <para>
- Ranked search for the same thing:
- <screen>
- @attr 1=4 @attr 2=102 "Information retrieval"
- </screen>
- </para>
-
- <para>
- Phrase search with a regular expression:
- <screen>
- @attr 1=4 @attr 5=102 "informat.* retrieval"
- </screen>
- </para>
-
- <para>
- Ranked search with a regular expression:
- <screen>
- @attr 1=4 @attr 5=102 @attr 2=102 "informat.* retrieval"
- </screen>
- </para>
-
- <para>
- In the GILS schema (<literal>gils.abs</literal>), the
- west-bounding-coordinate is indexed as type <literal>n</literal>,
- and is therefore searched by specifying
- <emphasis>structure</emphasis>=<emphasis>Numeric String</emphasis>.
- To match all those records with west-bounding-coordinate greater
- than -114 we use the following query:
- <screen>
- @attr 4=109 @attr 2=5 @attr gils 1=2038 -114
- </screen>
- </para>
- </sect3>
</sect2>
<sect2>
will not be searchable.
</para>
</note>
- <para>
- The following Explain categories are supported:
- <literal>CategoryList</literal>, <literal>TargetInfo</literal>,
- <literal>DatabaseInfo</literal>, <literal>AttributeDetails</literal>.
- </para>
- <para>
- The following Explain search atributes are supported:
- <literal>ExplainCategory</literal> (@attr 1=1),
- <literal>DatabaseName</literal> (@attr 1=3),
- <literal>DateAdded</literal> (@attr 1=9),
- <literal>DateChanged</literal>(@ayyt 1=10).
- See <filename>tab/explain.att</filename> for more information.
- </para>
-
- <sect3>
- <title>Example searches with yaz-client</title>
-
-
- <para>
- List supported categories to find out which explain commands are
- supported:
- <screen>
- Z> base IR-Explain-1
- Z> @attr exp1 1=1 categorylist
- Z> form sutrs
- Z> show 1+2
- </screen>
- </para>
-
- <para>
- Get target info, that is, investigate which databases exist at
- this server endpoint:
- <screen>
- Z> base IR-Explain-1
- Z> @attr exp1 1=1 targetinfo
- Z> form xml
- Z> show 1+1
- Z> form grs-1
- Z> show 1+1
- Z> form sutrs
- Z> show 1+1
- </screen>
- </para>
-
- <para>
- List all supported databases, the number of hits
- is the number of databases found, which most commonly are the
- following two:
- the <literal>Default</literal> and the
- <literal>IR-Explain-1</literal> databases.
- <screen>
- Z> base IR-Explain-1
- Z> f @attr exp1 1=1 databaseinfo
- Z> form sutrs
- Z> show 1+2
- </screen>
- </para>
-
- <para>
- Get database info record for database <literal>Default</literal>.
- <screen>
- Z> base IR-Explain-1
- Z> @and @attr exp1 1=1 databaseinfo @attr exp1 1=3 Default
- </screen>
- Identical query with explicitly specified attribute set:
- <screen>
- Z> base IR-Explain-1
- Z> @attrset exp1 @and @attr 1=1 databaseinfo @attr 1=3 Default
- </screen>
- </para>
-
- <para>
- Get attribute details record for database
- <literal>Default</literal>.
- This query is very useful to study the internal Zebra indexes.
- If records have been indexed using the <literal>alvis</literal>
- XSLT filter, the string representation names of the known indexes can be
- found.
- <screen>
- Z> base IR-Explain-1
- Z> @and @attr exp1 1=1 attributedetails @attr exp1 1=3 Default
- </screen>
- Identical query with explicitly specified attribute set:
- <screen>
- Z> base IR-Explain-1
- Z> @attrset exp1 @and @attr 1=1 attributedetails @attr 1=3 Default
- </screen>
- </para>
-
- </sect3>
</sect2>
</sect1>
</chapter>