doc/querymodel.xml

   1  <chapter id="querymodel">
   2   <!-- $Id: querymodel.xml,v 1.16 2006-06-25 21:54:03 marc Exp $ -->
   3   <title>Query Model</title>
   4
   5   <sect1 id="querymodel-overview">
   6    <title>Query Model Overview</title>
   7
   8    <sect2 id="querymodel-query-languages">
   9     <title>Query Languages</title>
  10
  11     <para>
  12      Zebra is born as a networking Information Retrieval engine adhering
  13      to the international standards
  14      <ulink url="&url.z39.50;">Z39.50</ulink> and
  15      <ulink url="&url.sru;">SRU</ulink>,
  16      and implement the
  17      <literal>type-1 Reverse Polish Notation (RPN)</literal> query
  18      model defined there.
  19      Unfortunately, this model has only defined a binary
  20      encoded representation, which is used as transport packaging in
  21      the Z39.50 protocol layer. This representation is not human
  22      readable, nor defines any convenient way to specify queries.
  23     </para>
  24     <para>
  25      Since the <literal>type-1 (RPN)</literal>
  26      query structure has no direct, useful string
  27      representation, every origin application needs to provide some
  28      form of mapping from a local query notation or representation to it.
  29      </para>
  30
  31
  32    <sect3 id="querymodel-query-languages-pqf">
  33     <title>Prefix Query Format (PQF)</title>
  34
  35    <para>
  36      Index Data has defined a textual representation in the
  37      <literal>Prefix Query Format</literal>, short
  38      <literal>PQF</literal>, which maps
  39       <literal>one-to-one</literal> to binary encoded
  40       <literal>type-1 RPN</literal> query packages.
  41       It has been adopted by other
  42       parties developing Z39.50 software, and is often referred to as
  43      <literal>Prefix Query Notation</literal>, or in short
  44      <literal>PQN</literal>. See
  45      <xref linkend="querymodel-pqf"/> for further explanations and
  46      descriptions of Zebra's capabilities.
  47     </para>
  48    </sect3>
  49
  50    <sect3 id="querymodel-query-languages-cql">
  51     <title>Common Query Language (CQL)</title>
  52      <para>
  53       The query model of the   <literal>type-1 RPN</literal>,
  54       expressed in <literal>PQF/PQN</literal> is natively supported.
  55       On the other hand, the default <literal>SRU</literal>
  56       webservices <literal>Common Query Language</literal>
  57      <ulink url="&url.cql;">CQL</ulink> is not natively supported.
  58      </para>
  59      <para>
  60      Zebra can be configured to understand and map CQL to PQF. See
  61      <xref linkend="querymodel-cql-to-pqf"/>.
  62     </para>
  63    </sect3>
  64
  65    </sect2>
  66
  67    <sect2 id="querymodel-operation-types">
  68     <title>Operation types</title>
  69     <para>
  70      Zebra supports all of the three different
  71      <literal>Z39.50/SRU</literal> operations defined in the
  72      standards: <literal>explain</literal>, <literal>search</literal>,
  73      and <literal>scan</literal>. A short description of the
  74      functionality and purpose of each is quite in order here.
  75     </para>
  76
  77     <sect3 id="querymodel-operation-type-explain">
  78      <title>Explain Operation</title>
  79      <para>
  80       The <emphasis>syntax</emphasis> of Z39.50/SRU queries is
  81       well known to any client, but the specific
  82       <emphasis>semantics</emphasis> - taking into account a
  83       particular servers functionalities and abilities - must be
  84       discovered from case to case. Enters the
  85       <literal>explain</literal> operation, which provides the means
  86       for learning which
  87       <emphasis>fields</emphasis> (also called
  88       <emphasis>indexes</emphasis> or <emphasis>access points</emphasis>
  89       are provided, which default parameter the server uses, which
  90       retrieve document formats are defined, and which specific parts
  91       of the general query model are supported.
  92      </para>
  93      <para>
  94       The Z39.50 embeds the <literal>explain</literal> operation
  95       by performing a
  96       <literal>search</literal> in the magic
  97       <literal>IR-Explain-1</literal> database;
  98       see <xref linkend="querymodel-exp1"/>.
  99      </para>
 100      <para>
 101       In SRU, <literal>explain</literal> is an entirely  separate
 102       operation, which returns an  <literal>ZeeRex
 103       XML</literal> record according to the
 104       structure defined by the protocol.
 105      </para>
 106      <para>
 107       In both cases, the information gathered through
 108       <literal>explain</literal> operations can be used to
 109       auto-configure a client user interface to the servers
 110       capabilities.
 111      </para>
 112     </sect3>
 113
 114     <sect3 id="querymodel-operation-type-search">
 115      <title>Search Operation</title>
 116      <para>
 117       Search and retrieve interactions are the raison d'être.
 118       They are used to query the remote database and
 119       return search result documents.  Search queries span from
 120       simple free text searches to nested complex boolean queries,
 121       targeting specific indexes, and possibly enhanced with many
 122       query semantic specifications. Search interactions are the heart
 123       and soul of Z39.50/SRU servers.
 124      </para>
 125     </sect3>
 126
 127     <sect3 id="querymodel-operation-type-scan">
 128      <title>Scan Operation</title>
 129      <para>
 130       The <literal>scan</literal> operation is a helper functionality,
 131        which operates on one index or access point a time.
 132      </para>
 133      <para>
 134       It provides
 135       the means to investigate the content of specific indexes.
 136       Scanning an index returns a handful of terms actually fond in
 137       the indexes, and in addition the <literal>scan</literal>
 138       operation returns the number of documents indexed by each term.
 139       A search client can use this information to propose proper
 140       spelling of search terms, to auto-fill search boxes, or to
 141       display  controlled vocabularies.
 142      </para>
 143     </sect3>
 144
 145    </sect2>
 146
 147  </sect1>
 148
 149
 150   <sect1 id="querymodel-pqf">
 151    <title>Prefix Query Format syntax and semantics</title>
 152    <para>
 153     The <ulink url="&url.yaz.pqf;">PQF grammer</ulink>
 154     is documented in the YAZ manual, and shall not be
 155     repeated here. This textual PQF representation
 156     is always during search mapped to the equivalent Zebra internal
 157     query parse tree.
 158    </para>
 159
 160    <sect2 id="querymodel-pqf-tree">
 161     <title>PQF tree structure</title>
 162     <para>
 163      The PQF parse tree - or the equivalent textual representation -
 164      may start with one specification of the
 165      <emphasis>attribute set</emphasis> used. Following is a query
 166      tree, which
 167      consists of <emphasis>atomic query parts (APT)</emphasis> or
 168      <emphasis>named result sets</emphasis>, eventually
 169      paired by <emphasis>boolean binary operators</emphasis>, and
 170      finally  <emphasis>recursively combined </emphasis> into
 171      complex query trees.
 172     </para>
 173
 174     <sect3 id="querymodel-attribute-sets">
 175      <title>Attribute sets</title>
 176      <para>
 177       Attribute sets define the exact meaning and semantics of queries
 178       issued. Zebra comes with some predefined attribute set
 179       definitions, others can easily be defined and added to the
 180       configuration.
 181      </para>
 182
 183
 184      <table id="querymodel-attribute-sets-table"
 185       frame="all" rowsep="1" colsep="1" align="center">
 186
 187       <caption>Attribute sets predefined in Zebra</caption>
 188
 189        <thead>
 190        <tr>
 191          <td>Attribute set</td>
 192          <td>Short hand</td>
 193          <td>Status</td>
 194          <td>Notes</td>
 195         </tr>
 196       </thead>
 197
 198        <tbody>
 199         <tr>
 200          <td><literal>Explain</literal></td>
 201          <td><literal>exp-1</literal></td>
 202          <td>Special attribute set used on the special automagic
 203           <literal>IR-Explain-1</literal> database to gain information on
 204           server capabilities, database names, and database
 205           and semantics.</td>
 206          <td>predefined</td>
 207         </tr>
 208         <tr>
 209          <td><literal>Bib1</literal></td>
 210          <td><literal>bib-1</literal></td>
 211          <td>Standard PQF query language attribute set which defines the
 212           semantics of Z39.50 searching. In addition, all of the
 213           non-use attributes (type 2-9) define the hard-wired
 214           Zebra internal query
 215           processing.</td>
 216          <td>default</td>
 217         </tr>
 218         <tr>
 219          <td><literal>GILS</literal></td>
 220          <td><literal>gils</literal></td>
 221          <td>Extension to the <literal>Bib1</literal> attribute set.</td>
 222          <td>predefined</td>
 223         </tr>
 224         <!--
 225         <tr>
 226          <td><literal>IDXPATH</literal></td>
 227          <td><literal>idxpath</literal></td>
 228          <td>Hardwired XPATH like attribute set, only available for
 229              indexing with the GRS record model</td>
 230          <td>depreciated</td>
 231         </tr>
 232         -->
 233        </tbody>
 234      </table>
 235     </sect3>
 236
 237     <para>
 238      The <literal>use attributes (type 1)</literal> mappings  the
 239      predefined attribute sets are found in the
 240      attribute set configuration files <filename>tab/*.att</filename>.
 241     </para>
 242
 243     <note>
 244      The Zebra internal query processing is modeled after
 245      the <literal>Bib1</literal> attribute set, and the non-use
 246      attributes type 2-6 are hard-wired in. It is therefore essential
 247      to be familiar with <xref linkend="querymodel-bib1-nonuse"/>.
 248     </note>
 249
 250
 251     <sect3 id="querymodel-boolean-operators">
 252      <title>Boolean operators</title>
 253      <para>
 254       A pair of subquery trees, or of atomic queries, is combined
 255       using the standard boolean operators into new query trees.
 256       Thus, boolean operators are always internal nodes in the query tree.
 257      </para>
 258
 259      <table id="querymodel-boolean-operators-table"
 260       frame="all" rowsep="1" colsep="1" align="center">
 261
 262       <caption>Boolean operators</caption>
 263        <thead>
 264         <tr>
 265          <td>Keyword</td>
 266          <td>Operator</td>
 267          <td>Description</td>
 268         </tr>
 269       </thead>
 270        <tbody>
 271         <tr><td><literal>@and</literal></td>
 272          <td>binary <literal>AND</literal> operator</td>
 273          <td>Set intersection of two atomic queries hit sets</td>
 274         </tr>
 275         <tr><td><literal>@or</literal></td>
 276          <td>binary <literal>OR</literal> operator</td>
 277          <td>Set union of two atomic queries hit sets</td>
 278         </tr>
 279         <tr><td><literal>@not</literal></td>
 280          <td>binary <literal>AND NOT</literal> operator</td>
 281          <td>Set complement of two atomic queries hit sets</td>
 282         </tr>
 283         <tr><td><literal>@prox</literal></td>
 284          <td>binary <literal>PROXIMY</literal> operator</td>
 285          <td>Set intersection of two atomic queries hit sets. In
 286           addition, the intersection set is purged for all
 287           documents which do not satisfy the requested query
 288           term proximity. Usually a proper subset of the AND
 289           operation.</td>
 290         </tr>
 291        </tbody>
 292      </table>
 293
 294      <para>
 295       For example, we can combine the terms
 296       <emphasis>information</emphasis> and <emphasis>retrieval</emphasis>
 297       into different searches in the default index of the default
 298       attribute set as follows.
 299       Querying for the union of all documents containing the
 300       terms <emphasis>information</emphasis> OR
 301       <emphasis>retrieval</emphasis>:
 302       <screen>
 303        Z> find @or information retrieval
 304       </screen>
 305      </para>
 306      <para>
 307       Querying for the intersection of all documents containing the
 308       terms <emphasis>information</emphasis> AND
 309       <emphasis>retrieval</emphasis>:
 310       The hit set is a subset of the corresponding
 311       OR query.
 312       <screen>
 313        Z> find @and information retrieval
 314       </screen>
 315      </para>
 316      <para>
 317       Querying for the intersection of all documents containing the
 318       terms <emphasis>information</emphasis> AND
 319       <emphasis>retrieval</emphasis>, taking proximity into account:
 320       The hit set is a subset of the corresponding
 321       AND query
 322       (see the <ulink url="&url.yaz.pqf;">PQF grammar</ulink> for
 323       details on the proximity operator):
 324       <screen>
 325        Z> find @prox 0 3 0 2 k 2 information retrieval
 326       </screen>
 327      </para>
 328      <para>
 329       Querying for the intersection of all documents containing the
 330       terms <emphasis>information</emphasis> AND
 331       <emphasis>retrieval</emphasis>, in the same order and near each
 332       other as described in the term list.
 333       The hit set is a subset of the corresponding
 334       PROXIMY query.
 335       <screen>
 336        Z> find "information retrieval"
 337       </screen>
 338      </para>
 339     </sect3>
 340
 341
 342     <sect3 id="querymodel-atomic-queries">
 343      <title>Atomic queries (APT)</title>
 344      <para>
 345       Atomic queries are the query parts which work on one access point
 346       only. These consist of <literal>an attribute list</literal>
 347       followed by a <literal>single term</literal> or a
 348       <literal>quoted term list</literal>, and are often called
 349       <emphasis>Attributes-Plus-Terms (APT)</emphasis> queries.
 350      </para>
 351      <para>
 352       Atomic (APT) queries are always leaf nodes in the PQF query tree.
 353       Unsupplied non-use attributes type 2-9 are either inherited from
 354       higher nodes in the query tree, or are set to Zebra's default values.
 355       See <xref linkend="querymodel-bib1"/> for details.
 356      </para>
 357
 358      <table id="querymodel-atomic-queries-table"
 359       frame="all" rowsep="1" colsep="1" align="center">
 360
 361       <caption>Atomic queries (APT)</caption>
 362        <thead>
 363         <tr>
 364          <td>Name</td>
 365          <td>Type</td>
 366          <td>Notes</td>
 367         </tr>
 368       </thead>
 369        <tbody>
 370         <tr>
 371          <td><emphasis>attribute list</emphasis></td>
 372          <td>List of <literal>orthogonal</literal> attributes</td>
 373          <td>Any of the orthogonal attribute types may be omitted,
 374           these are inherited from higher query tree nodes, or if not
 375           inherited, are set to the default Zebra configuration values.
 376          </td>
 377         </tr>
 378         <tr>
 379          <td><emphasis>term</emphasis></td>
 380          <td>single <literal>term</literal>
 381           or <literal>quoted term list</literal>   </td>
 382          <td>Here the search terms or list of search terms is added
 383           to the query</td>
 384         </tr>
 385        </tbody>
 386      </table>
 387      <para>
 388       Querying for the term <emphasis>information</emphasis> in the
 389       default index using the default attribute set, the server choice
 390       of access point/index, and the default non-use attributes.
 391       <screen>
 392        Z> find information
 393       </screen>
 394      </para>
 395      <para>
 396       Equivalent query fully specified including all default values:
 397       <screen>
 398        Z> find @attrset bib-1 @attr 1=1017 @attr 2=3 @attr 3=3 @attr 4=1 @attr 5=100 @attr 6=1 information
 399       </screen>
 400      </para>
 401
 402      <para>
 403       Finding all documents which have the term
 404       <emphasis>debussy</emphasis> in the title field.
 405       <screen>
 406        Z> find @attr 1=4 debussy
 407       </screen>
 408      </para>
 409
 410      <para>
 411       The <literal>scan</literal> operation is only supported with
 412       atomic APT queries, as it is bound to one access point at a
 413       time. Boolean query trees are not allowed during
 414       <literal>scan</literal>.
 415       </para>
 416
 417      <para>
 418       For example, we migh want to scan the title index, starting with
 419       the term
 420       <emphasis>debussy</emphasis>, and displaying this and the
 421       following terms in lexicographic order:
 422       <screen>
 423        Z> scan @attr 1=4 debussy
 424       </screen>
 425      </para>
 426     </sect3>
 427
 428
 429     <sect3 id="querymodel-resultset">
 430      <title>Named Result Sets</title>
 431      <para>
 432       Named result sets are supported in Zebra, and result sets can be
 433       used as operands without limitations. It follows that named
 434       result sets are leaf nodes in the PQF query tree, exactly as
 435       atomic APT queries are.
 436      </para>
 437      <para>
 438       After the execution of a search, the result set is available at
 439       the server, such that the client can use it for subsequent
 440       searches or retrieval requests. The Z30.50 standard actually
 441       stresses the fact that result sets are volatile. It may cease
 442       to exist at any time point after search, and the server will
 443       send a diagnostic to the effect that the requested
 444       result set does not exist any more.
 445      </para>
 446
 447      <para>
 448       Defining a named result set and re-using it in the next query,
 449       using <literal>yaz-client</literal>.
 450       <screen>
 451        Z> f @attr 1=4 mozart
 452        ...
 453        Number of hits: 43, setno 1
 454        ...
 455        Z> f @and @set 1 @attr 1=4 amadeus
 456        ...
 457        Number of hits: 14, setno 2
 458        ...
 459        Z> f @attr 1=1016 beethoven
 460        ...
 461        Number of hits: 26, setno 3
 462        ...
 463       </screen>
 464      </para>
 465
 466      <note>
 467       Named result sets are only supported by the Z39.50 protocol.
 468       The SRU web service is stateless, and therefore the notion of
 469       named result sets does not exist when accessing a Zebra server by
 470       the SRU protocol.
 471      </note>
 472     </sect3>
 473
 474
 475     <sect3 id="querymodel-use-string">
 476      <title>Zebra's special access point of type 'string'</title>
 477      <para>
 478       The numeric <literal>use (type 1)</literal> attribute is usually
 479       referred to from a given
 480       attribute set. In addition, Zebra let you use
 481       <emphasis>any internal index
 482        name defined in your configuration</emphasis>
 483       as use attribute value. This is a great feature for
 484       debugging, and when you do
 485       not need the complexity of defined use attribute values. It is
 486       the preferred way of accessing Zebra indexes directly.
 487      </para>
 488      <para>
 489       Finding all documents which have the term list "information
 490       retrieval" in an Zebra index, using it's internal full string
 491       name. Scanning the same index.
 492       <screen>
 493        Z> find @attr 1=sometext "information retrieval"
 494        Z> scan @attr 1=sometext aterm
 495       </screen>
 496      </para>
 497      <para>
 498       Searching or scanning
 499       the bib-1 use attribute 54 using it's string name:
 500       <screen>
 501        Z> find @attr 1=Code-language eng
 502        Z> scan @attr 1=Code-language ""
 503       </screen>
 504      </para>
 505      <para>
 506       It is possible to search
 507       in any silly string index - if it's defined in your
 508       indexation rules and can be parsed by the PQF parser.
 509       This is definitely not the recommended use of
 510       this facility, as it might confuse your users with some very
 511       unexpected results.
 512       <screen>
 513        Z> find @attr 1=silly/xpath/alike[@index]/name "information retrieval"
 514       </screen>
 515      </para>
 516      <para>
 517       See also <xref linkend="querymodel-pqf-apt-mapping"/> for details, and
 518       <xref linkend="server-sru"/>
 519       for the SRU PQF query extension using string names as a fast
 520       debugging facility.
 521      </para>
 522     </sect3>
 523
 524     <sect3 id="querymodel-use-xpath">
 525      <title>Zebra's special access point of type 'XPath'
 526       for GRS filters</title>
 527      <para>
 528       As we have seen above, it is possible (albeit seldom a great
 529       idea) to emulate
 530       <ulink url="http://www.w3.org/TR/xpath">XPath 1.0</ulink> based
 531       search by defining <literal>use (type 1)</literal>
 532       <emphasis>string</emphasis> attributes which in appearance
 533       <emphasis>resemble XPath queries</emphasis>. There are two
 534       problems with this approach: first, the XPath-look-alike has to
 535       be defined at indexation time, no new undefined
 536       XPath queries can entered at search time, and second, it might
 537       confuse users very much that an XPath-alike index name in fact
 538       gets populated from a possible entirely different XML element
 539       than it pretends to access.
 540      </para>
 541      <para>
 542       When using the <literal>GRS Record Model</literal>
 543       (see  <xref linkend="record-model-grs"/>), we have the
 544       possibility to embed <emphasis>life</emphasis>
 545       XPath expressions
 546       in the PQF queries, which are here called
 547       <literal>use (type 1)</literal> <emphasis>xpath</emphasis>
 548       attributes. You must enable the
 549       <literal>xpath enable</literal> directive in your
 550       <literal>.abs</literal> configuration files.
 551      </para>
 552      <note>
 553       Only a <emphasis>very</emphasis> restricted subset of the
 554       <ulink url="http://www.w3.org/TR/xpath">XPath 1.0</ulink>
 555       standard is supported as the GRS record model is simpler than
 556       a full XML DOM structure. See the following examples for
 557       possibilities.
 558      </note>
 559      <para>
 560       Finding all documents which have the term "content"
 561       inside a text node found in a specific XML DOM
 562       <emphasis>subtree</emphasis>, whose starting element is
 563       addressed by XPath.
 564       <screen>
 565        Z> find @attr 1=/root content
 566        Z> find @attr 1=/root/first content
 567       </screen>
 568       <emphasis>Notice that the
 569        XPath must be absolute, i.e., must start with '/', and that the
 570        XPath <literal>descendant-or-self</literal> axis followed by a
 571        text node selection <literal>text()</literal> is implicitly
 572        appended to the stated XPath.
 573       </emphasis>
 574       It follows that the above searches are interpreted as:
 575       <screen>
 576        Z> find @attr 1=/root//text() content
 577        Z> find @attr 1=/root/first//text() content
 578       </screen>
 579      </para>
 580
 581      <para>
 582       Searching inside attribute strings is possible:
 583       <screen>
 584        Z> find @attr 1=/link/@creator morten
 585       </screen>
 586       </para>
 587
 588      <para>
 589       Filter the addressing XPath by a predicate working on exact
 590       string values in
 591       attributes (in the XML sense) can be done: return all those docs which
 592       have the term "english" contained in one of all text subnodes of
 593       the subtree defined by the XPath
 594       <literal>/record/title[@lang='en']</literal>. And similar
 595       predicate filtering.
 596       <screen>
 597        Z> find @attr 1=/record/title[@lang='en'] english
 598        Z> find @attr 1=/link[@creator='sisse'] sibelius
 599        Z> find @attr 1=/link[@creator='sisse']/description[@xml:lang='da'] sibelius
 600       </screen>
 601      </para>
 602
 603      <para>
 604       Combining numeric indexes, boolean expressions,
 605       and xpath based searches is possible:
 606       <screen>
 607        Z> find @attr 1=/record/title @and foo bar
 608        Z> find @and @attr 1=/record/title foo @attr 1=4 bar
 609       </screen>
 610      </para>
 611      <para>
 612       Escaping PQF keywords and other non-parseable XPath constructs
 613       with <literal>'{ }'</literal> to prevent syntax errors:
 614       <screen>
 615        Z> find @attr {1=/root/first[@attr='danish']} content
 616        Z> find @attr {1=/record/@set} oai
 617       </screen>
 618      </para>
 619      <warning>
 620       It is worth mentioning that these dynamic performed XPath
 621       queries are a performance bottleneck, as no optimized
 622       specialized indexes can be used. Therefore, avoid the use of
 623       this facility when speed is essential, and the database content
 624       size is medium to large.
 625      </warning>
 626
 627     </sect3>
 628
 629    </sect2>
 630
 631    <sect2 id="querymodel-exp1">
 632     <title>Explain Attribute Set</title>
 633     <para>
 634      The Z39.50 standard defines the
 635      <ulink url="&url.z39.50.explain;">Explain</ulink> attribute set
 636      <literal>Exp-1</literal>, which is used to discover information
 637      about a server's search semantics and functional capabilities
 638      Zebra exposes a  "classic"
 639      Explain database by base name <literal>IR-Explain-1</literal>, which
 640      is populated with system internal information.
 641     </para>
 642    <para>
 643      The attribute-set <literal>exp-1</literal> consists of a single
 644      <literal>use attribute (type 1)</literal>.
 645     </para>
 646     <para>
 647      In addition, the non-Use
 648      <literal>bib-1</literal> attributes, that is, the types
 649      <literal>Relation</literal>, <literal>Position</literal>,
 650      <literal>Structure</literal>, <literal>Truncation</literal>,
 651      and <literal>Completeness</literal> are imported from
 652      the <literal>bib-1</literal> attribute set, and may be used
 653      within any explain query.
 654     </para>
 655
 656     <sect3 id="querymodel-exp1-use">
 657     <title>Use Attributes (type = 1)</title>
 658      <para>
 659       The following Explain search attributes are supported:
 660       <literal>ExplainCategory</literal> (@attr 1=1),
 661       <literal>DatabaseName</literal> (@attr 1=3),
 662       <literal>DateAdded</literal> (@attr 1=9),
 663       <literal>DateChanged</literal>(@attr 1=10).
 664      </para>
 665      <para>
 666       A search in the use attribute  <literal>ExplainCategory</literal>
 667       supports only these predefined values:
 668       <literal>CategoryList</literal>, <literal>TargetInfo</literal>,
 669       <literal>DatabaseInfo</literal>, <literal>AttributeDetails</literal>.
 670      </para>
 671      <para>
 672       See <filename>tab/explain.att</filename> and the
 673       <ulink url="&url.z39.50;">Z39.50</ulink> standard
 674       for more information.
 675      </para>
 676     </sect3>
 677
 678     <sect3>
 679      <title>Explain searches with yaz-client</title>
 680      <para>
 681       Classic Explain only defines retrieval of Explain information
 682       via ASN.1. Practically no Z39.50 clients supports this. Fortunately
 683       they don't have to - Zebra allows retrieval of this information
 684       in other formats:
 685       <literal>SUTRS</literal>, <literal>XML</literal>,
 686       <literal>GRS-1</literal> and  <literal>ASN.1</literal> Explain.
 687      </para>
 688
 689      <para>
 690       List supported categories to find out which explain commands are
 691       supported:
 692       <screen>
 693        Z> base IR-Explain-1
 694        Z> find @attr exp1 1=1 categorylist
 695        Z> form sutrs
 696        Z> show 1+2
 697       </screen>
 698      </para>
 699
 700      <para>
 701       Get target info, that is, investigate which databases exist at
 702       this server endpoint:
 703       <screen>
 704        Z> base IR-Explain-1
 705        Z> find @attr exp1 1=1 targetinfo
 706        Z> form xml
 707        Z> show 1+1
 708        Z> form grs-1
 709        Z> show 1+1
 710        Z> form sutrs
 711        Z> show 1+1
 712       </screen>
 713      </para>
 714
 715      <para>
 716       List all supported databases, the number of hits
 717       is the number of databases found, which most commonly are the
 718       following two:
 719       the <literal>Default</literal> and the
 720       <literal>IR-Explain-1</literal> databases.
 721       <screen>
 722        Z> base IR-Explain-1
 723        Z> find @attr exp1 1=1 databaseinfo
 724        Z> form sutrs
 725        Z> show 1+2
 726       </screen>
 727      </para>
 728
 729      <para>
 730       Get database info record for database <literal>Default</literal>.
 731       <screen>
 732        Z> base IR-Explain-1
 733        Z> find @and @attr exp1 1=1 databaseinfo @attr exp1 1=3 Default
 734       </screen>
 735       Identical query with explicitly specified attribute set:
 736       <screen>
 737        Z> base IR-Explain-1
 738        Z> find @attrset exp1 @and @attr 1=1 databaseinfo @attr 1=3 Default
 739       </screen>
 740      </para>
 741
 742      <para>
 743       Get attribute details record for database
 744       <literal>Default</literal>.
 745       This query is very useful to study the internal Zebra indexes.
 746       If records have been indexed using the <literal>alvis</literal>
 747       XSLT filter, the string representation names of the known indexes can be
 748       found.
 749       <screen>
 750        Z> base IR-Explain-1
 751        Z> find @and @attr exp1 1=1 attributedetails @attr exp1 1=3 Default
 752       </screen>
 753       Identical query with explicitly specified attribute set:
 754       <screen>
 755        Z> base IR-Explain-1
 756        Z> find @attrset exp1 @and @attr 1=1 attributedetails @attr 1=3 Default
 757       </screen>
 758      </para>
 759     </sect3>
 760
 761    </sect2>
 762
 763    <sect2 id="querymodel-bib1">
 764     <title>Bib1 Attribute Set</title>
 765     <para>
 766      Most of the information contained in this section is an excerpt of
 767      the <literal>ATTRIBUTE SET BIB-1 (Z39.50-1995)
 768       SEMANTICS</literal>,
 769      found at <ulink url="&url.z39.50.attset.bib1.1995;">. The BIB-1
 770       Attribute Set Semantics</ulink> from 1995, also in an updated
 771      <ulink url="&url.z39.50.attset.bib1;">Bib-1
 772       Attribute Set</ulink>
 773      version from 2003. Index Data is not the copyright holder of this
 774      information, except for the configuration details, the listing of
 775      Zebra's capabilities, and the example queries.
 776     </para>
 777
 778
 779    <sect3 id="querymodel-bib1-use">
 780      <title>Use Attributes (type 1)</title>
 781
 782     <para>
 783      A use attribute specifies an access point for any atomic query.
 784      These access points are highly dependent on the attribute set used
 785      in the query, and are user configurable using the following
 786      default configuration files:
 787      <filename>tab/bib1.att</filename>,
 788      <filename>tab/dan1.att</filename>,
 789      <filename>tab/explain.att</filename>, and
 790      <filename>tab/gils.att</filename>.
 791      New attribute sets can be added by adding new
 792      <filename>tab/*.att</filename> configuration files, which need to
 793      be sourced in the main configuration <filename>zebra.cfg</filename>.
 794      </para>
 795
 796     <para>
 797      In addition, Zebra allows the access of
 798      <emphasis>internal index names</emphasis> and <emphasis>dynamic
 799      XPath</emphasis> as use attributes; see
 800       <xref linkend="querymodel-use-string"/> and
 801      <xref linkend="querymodel-use-xpath"/>.
 802     </para>
 803
 804     <para>
 805      Phrase search for <emphasis>information retrieval</emphasis> in
 806      the title-register, scanning the same register afterwards:
 807      <screen>
 808       Z> find @attr 1=4 "information retrieval"
 809       Z> scan @attr 1=4 information
 810      </screen>
 811     </para>
 812     </sect3>
 813
 814    </sect2>
 815
 816
 817    <sect2 id="querymodel-bib1-nonuse">
 818      <title>Zebra general Bib1 Non-Use Attributes (type 2-6)</title>
 819
 820     <sect3 id="querymodel-bib1-relation">
 821      <title>Relation Attributes (type 2)</title>
 822
 823      <para>
 824       Relation attributes describe the relationship of the access
 825       point (left side
 826       of the relation) to the search term as qualified by the attributes (right
 827       side of the relation), e.g., Date-publication &lt;= 1975.
 828       </para>
 829
 830      <table id="querymodel-bib1-relation-table"
 831       frame="all" rowsep="1" colsep="1" align="center">
 832
 833       <caption>Relation Attributes (type 2)</caption>
 834       <thead>
 835         <tr>
 836          <td>Relation</td>
 837          <td>Value</td>
 838          <td>Notes</td>
 839         </tr>
 840        </thead>
 841        <tbody>
 842         <tr>
 843          <td> Less than</td>
 844          <td>1</td>
 845          <td>supported</td>
 846         </tr>
 847         <tr>
 848          <td>Less than or equal</td>
 849          <td>2</td>
 850          <td>supported</td>
 851         </tr>
 852         <tr>
 853          <td>Equal</td>
 854          <td>3</td>
 855          <td>default</td>
 856         </tr>
 857         <tr>
 858          <td>Greater or equal</td>
 859          <td>4</td>
 860          <td>supported</td>
 861         </tr>
 862         <tr>
 863          <td>Greater than</td>
 864          <td>5</td>
 865          <td>supported</td>
 866         </tr>
 867         <tr>
 868          <td>Not equal</td>
 869          <td>6</td>
 870          <td>unsupported</td>
 871         </tr>
 872         <tr>
 873          <td>Phonetic</td>
 874          <td>100</td>
 875          <td>unsupported</td>
 876         </tr>
 877         <tr>
 878          <td>Stem</td>
 879          <td>101</td>
 880          <td>unsupported</td>
 881         </tr>
 882         <tr>
 883          <td>Relevance</td>
 884          <td>102</td>
 885          <td>supported</td>
 886         </tr>
 887         <tr>
 888          <td>AlwaysMatches</td>
 889          <td>103</td>
 890          <td>supported</td>
 891         </tr>
 892        </tbody>
 893      </table>
 894
 895      <para>
 896       The relation attributes
 897       <literal>1-5</literal> are supported and work exactly as
 898       expected.
 899       All ordering operations are based on a lexicographical ordering,
 900       <emphasis>expect</emphasis> when the
 901       <literal>structure attribute numeric (109)</literal> is used. In
 902       this case, ordering is numerical. See
 903       <xref linkend="querymodel-bib1-structure"/>.
 904       <screen>
 905        Z>  find @attr 1=Title @attr 2=1 music
 906        ...
 907        Number of hits: 11745, setno 1
 908        ...
 909        Z>  find @attr 1=Title @attr 2=2 music
 910        ...
 911        Number of hits: 11771, setno 2
 912        ...
 913        Z>  find @attr 1=Title @attr 2=3 music
 914        ...
 915        Number of hits: 532, setno 3
 916        ...
 917        Z>  find @attr 1=Title @attr 2=4 music
 918        ...
 919        Number of hits: 11463, setno 4
 920        ...
 921        Z>  find @attr 1=Title @attr 2=5 music
 922        ...
 923        Number of hits: 11419, setno 5
 924       </screen>
 925      </para>
 926
 927      <para>
 928       The relation attribute
 929       <literal>Relevance (102)</literal> is supported, see
 930       <xref linkend="administration-ranking"/> for full information.
 931      </para>
 932
 933      <para>
 934       Ranked search for <emphasis>information retrieval</emphasis> in
 935       the title-register:
 936       <screen>
 937        Z> find @attr 1=4 @attr 2=102 "information retrieval"
 938       </screen>
 939      </para>
 940
 941      <para>
 942       The relation attribute
 943       <literal>AlwaysMatches (103)</literal> is in the default
 944       configuration
 945       supported in conjecture with structure attribute
 946       <literal>Phrase (1)</literal> (which may be omitted by
 947       default).
 948       It can be configured to work with other structure attributes,
 949       see the configuration file
 950       <filename>tab/default.idx</filename> and
 951        <xref linkend="querymodel-pqf-apt-mapping"/>.
 952      </para>
 953      <para>
 954       <literal>AlwaysMatches (103)</literal> is a
 955       great way to discover how many documents have been indexed in a
 956       given field. The search term is ignored, but needed for correct
 957       PQF syntax. An empty search term may be supplied.
 958       <screen>
 959        Z> find @attr 1=Title  @attr 2=103  ""
 960        Z> find @attr 1=Title  @attr 2=103  @attr 4=1 ""
 961       </screen>
 962      </para>
 963
 964
 965     </sect3>
 966
 967     <sect3 id="querymodel-bib1-position">
 968      <title>Position Attributes (type 3)</title>
 969
 970      <para>
 971       The position attribute specifies the location of the search term
 972       within the field or subfield in which it appears.
 973      </para>
 974
 975      <table id="querymodel-bib1-position-table"
 976       frame="all" rowsep="1" colsep="1" align="center">
 977
 978       <caption>Position Attributes (type 3)</caption>
 979       <thead>
 980         <tr>
 981          <td>Position</td>
 982          <td>Value</td>
 983          <td>Notes</td>
 984         </tr>
 985        </thead>
 986        <tbody>
 987         <tr>
 988          <td>First in field </td>
 989          <td>1</td>
 990          <td>unsupported</td>
 991         </tr>
 992         <tr>
 993          <td>First in subfield</td>
 994          <td>2</td>
 995          <td>unsupported</td>
 996         </tr>
 997         <tr>
 998          <td>Any position in field</td>
 999          <td>3</td>
1000          <td>default</td>
1001         </tr>
1002        </tbody>
1003      </table>
1004
1005     <para>
1006       The position attribute values <literal>first in field (1)</literal>,
1007       and <literal>first in subfield(2)</literal> are unsupported.
1008       Using them does not trigger an error, but silent defaults to
1009       <literal>any position in field (3)</literal>.
1010       <!-- It should -->
1011       </para>
1012     </sect3>
1013
1014     <sect3 id="querymodel-bib1-structure">
1015      <title>Structure Attributes (type 4)</title>
1016
1017      <para>
1018       The structure attribute specifies the type of search
1019       term. This causes the search to be mapped on
1020       different Zebra internal indexes, which must have been defined
1021       at index time.
1022      </para>
1023
1024      <para>
1025       The possible values of the
1026       <literal>structure attribute (type 4)</literal> can be defined
1027       using the configuration file <filename>
1028       tab/default.idx</filename>.
1029       The default configuration is summarized in this table.
1030      </para>
1031
1032      <table id="querymodel-bib1-structure-table"
1033       frame="all" rowsep="1" colsep="1" align="center">
1034
1035       <caption>Structure Attributes (type 4)</caption>
1036       <thead>
1037         <tr>
1038          <td>Structure</td>
1039          <td>Value</td>
1040          <td>Notes</td>
1041         </tr>
1042        </thead>
1043        <tbody>
1044         <tr>
1045          <td>Phrase </td>
1046          <td>1</td>
1047          <td>default</td>
1048         </tr>
1049         <tr>
1050          <td>Word</td>
1051          <td>2</td>
1052          <td>supported</td>
1053         </tr>
1054         <tr>
1055          <td>Key</td>
1056          <td>3</td>
1057          <td>supported</td>
1058         </tr>
1059         <tr>
1060          <td>Year</td>
1061          <td>4</td>
1062          <td>supported</td>
1063         </tr>
1064         <tr>
1065          <td>Date (normalized)</td>
1066          <td>5</td>
1067          <td>supported</td>
1068         </tr>
1069         <tr>
1070          <td>Word list</td>
1071          <td>6</td>
1072          <td>supported</td>
1073         </tr>
1074         <tr>
1075          <td>Date (un-normalized)</td>
1076          <td>100</td>
1077          <td>unsupported</td>
1078         </tr>
1079         <tr>
1080          <td>Name (normalized) </td>
1081          <td>101</td>
1082          <td>unsupported</td>
1083         </tr>
1084         <tr>
1085          <td>Name (un-normalized) </td>
1086          <td>102</td>
1087          <td>unsupported</td>
1088         </tr>
1089         <tr>
1090          <td>Structure</td>
1091          <td>103</td>
1092          <td>unsupported</td>
1093         </tr>
1094         <tr>
1095          <td>Urx</td>
1096          <td>104</td>
1097          <td>supported</td>
1098         </tr>
1099         <tr>
1100          <td>Free-form-text</td>
1101          <td>105</td>
1102          <td>supported</td>
1103         </tr>
1104         <tr>
1105          <td>Document-text</td>
1106          <td>106</td>
1107          <td>supported</td>
1108         </tr>
1109         <tr>
1110          <td>Local-number</td>
1111          <td>107</td>
1112          <td>supported</td>
1113         </tr>
1114         <tr>
1115          <td>String</td>
1116          <td>108</td>
1117          <td>unsupported</td>
1118         </tr>
1119         <tr>
1120          <td>Numeric string</td>
1121          <td>109</td>
1122          <td>supported</td>
1123         </tr>
1124        </tbody>
1125      </table>
1126
1127
1128     <para>
1129      The structure attribute values
1130      <literal>Word list (6)</literal>
1131      is supported, and maps to the boolean <literal>AND</literal>
1132      combination of words supplied. The word list is useful when
1133      google-like bag-of-word queries need to be translated from a GUI
1134      query language to PQF.  For example, the following queries
1135      are equivalent:
1136      <screen>
1137       Z> find @attr 1=Title @attr 4=6 "mozart amadeus"
1138       Z> find @attr 1=Title  @and mozart amadeus
1139      </screen>
1140     </para>
1141
1142     <para>
1143      The structure attribute value
1144      <literal>Free-form-text (105)</literal> and
1145      <literal>Document-text (106)</literal>
1146      are supported, and map both to the boolean <literal>OR</literal>
1147      combination of words supplied. The following queries
1148      are equivalent:
1149      <screen>
1150       Z> find @attr 1=Body-of-text @attr 4=105 "bach salieri teleman"
1151       Z> find @attr 1=Body-of-text @attr 4=106 "bach salieri teleman"
1152       Z> find @attr 1=Body-of-text @or bach @or salieri teleman
1153      </screen>
1154      This <literal>OR</literal> list of terms is very useful in
1155      combination with relevance ranking:
1156      <screen>
1157       Z> find @attr 1=Body-of-text @attr 2=102 @attr 4=105 "bach salieri teleman"
1158      </screen>
1159     </para>
1160
1161     <para>
1162      The structure attribute value
1163      <literal>Local number (107)</literal>
1164      is supported, and maps always to the Zebra internal document ID,
1165      irrespectively which use attribute is specified. The following queries
1166      have exactly the same unique record in the hit set:
1167      <screen>
1168       Z> find @attr 4=107 10
1169       Z> find @attr 1=4 @attr 4=107 10
1170       Z> find @attr 1=1010 @attr 4=107 10
1171      </screen>
1172     </para>
1173
1174     <para>
1175      In
1176      the GILS schema (<literal>gils.abs</literal>), the
1177      west-bounding-coordinate is indexed as type <literal>n</literal>,
1178      and is therefore searched by specifying
1179      <emphasis>structure</emphasis>=<emphasis>Numeric String</emphasis>.
1180      To match all those records with west-bounding-coordinate greater
1181      than -114 we use the following query:
1182      <screen>
1183       Z> find @attr 4=109 @attr 2=5 @attr gils 1=2038 -114
1184      </screen>
1185     </para>
1186      <note>
1187       The exact mapping between PQF queries and Zebra internal indexes
1188       and index types is explained in
1189        <xref linkend="querymodel-pqf-apt-mapping"/>.
1190       </note>
1191
1192    </sect3>
1193
1194     <sect3 id="querymodel-bib1-truncation">
1195      <title>Truncation Attributes (type = 5)</title>
1196
1197      <para>
1198       The truncation attribute specifies whether variations of one or
1199       more characters are allowed between search term and hit terms, or
1200       not. Using non-default truncation attributes will broaden the
1201       document hit set of a search query.
1202      </para>
1203
1204      <table id="querymodel-bib1-truncation-table"
1205       frame="all" rowsep="1" colsep="1" align="center">
1206
1207       <caption>Truncation Attributes (type 5)</caption>
1208       <thead>
1209         <tr>
1210          <td>Truncation</td>
1211          <td>Value</td>
1212          <td>Notes</td>
1213         </tr>
1214        </thead>
1215        <tbody>
1216         <tr>
1217          <td>Right truncation </td>
1218          <td>1</td>
1219          <td>supported</td>
1220         </tr>
1221         <tr>
1222          <td>Left truncation</td>
1223          <td>2</td>
1224          <td>supported</td>
1225         </tr>
1226         <tr>
1227          <td>Left and right truncation</td>
1228          <td>3</td>
1229          <td>supported</td>
1230         </tr>
1231         <tr>
1232          <td>Do not truncate</td>
1233          <td>100</td>
1234          <td>default</td>
1235         </tr>
1236         <tr>
1237          <td>Process # in search term</td>
1238          <td>101</td>
1239          <td>supported</td>
1240         </tr>
1241         <tr>
1242          <td>RegExpr-1 </td>
1243          <td>102</td>
1244          <td>supported</td>
1245         </tr>
1246         <tr>
1247          <td>RegExpr-2</td>
1248          <td>103</td>
1249          <td>supported</td>
1250         </tr>
1251        </tbody>
1252      </table>
1253
1254      <para>
1255       The truncation attribute values 1-3 perform the obvious way:
1256       <screen>
1257        Z> scan @attr 1=Body-of-text  schnittke
1258        ...
1259        * schnittke (81)
1260        schnittkes (31)
1261        schnittstelle (1)
1262        ...
1263        Z> find @attr 1=Body-of-text  @attr 5=1 schnittke
1264        ...
1265        Number of hits: 95, setno 7
1266        ...
1267        Z> find @attr 1=Body-of-text  @attr 5=2 schnittke
1268        ...
1269        Number of hits: 81, setno 6
1270        ...
1271        Z> find @attr 1=Body-of-text  @attr 5=3 schnittke
1272        ...
1273        Number of hits: 95, setno 8
1274       </screen>
1275       </para>
1276
1277      <para>
1278       The truncation attribute value
1279       <literal>Process # in search term (101)</literal> is a
1280       poor-man's regular expression search. It maps
1281       each <literal>#</literal> to <literal>.*</literal>, and
1282       performs then a <literal>Regexp-1 (102)</literal> regular
1283       expression search. The following two queries are equivalent:
1284       <screen>
1285        Z> find @attr 1=Body-of-text  @attr 5=101 schnit#ke
1286        Z> find @attr 1=Body-of-text  @attr 5=102 schnit.*ke
1287        ...
1288        Number of hits: 89, setno 10
1289       </screen>
1290      </para>
1291
1292      <para>
1293       The truncation attribute value
1294        <literal>Regexp-1 (102)</literal> is a normal regular search,
1295       see <xref linkend="querymodel-regular"/> for details.
1296       <screen>
1297        Z> find @attr 1=Body-of-text  @attr 5=102 schnit+ke
1298        Z> find @attr 1=Body-of-text  @attr 5=102 schni[a-t]+ke
1299       </screen>
1300      </para>
1301
1302      <para>
1303        The truncation attribute value
1304       <literal>Regexp-2 (103) </literal> is a Zebra specific extension
1305       which allows <emphasis>fuzzy</emphasis> matches. One single
1306       error in spelling of search terms is allowed, i.e., a document
1307       is hit if it includes a term which can be mapped to the used
1308       search term by one character substitution, addition, deletion or
1309       change of position.
1310       <screen>
1311        Z> find @attr 1=Body-of-text  @attr 5=100 schnittke
1312        ...
1313        Number of hits: 81, setno 14
1314        ...
1315        Z> find @attr 1=Body-of-text  @attr 5=103 schnittke
1316        ...
1317        Number of hits: 103, setno 15
1318        ...
1319       </screen>
1320       </para>
1321     </sect3>
1322
1323     <sect3 id="querymodel-bib1-completeness">
1324     <title>Completeness Attributes (type = 6)</title>
1325
1326
1327      <para>
1328       The <literal>Completeness Attributes (type = 6)</literal>
1329       is used to specify that a given search term or term list is  either
1330       part of the terms of a given index/field
1331       (<literal>Incomplete subfield (1)</literal>), or is
1332       what literally is found in the entire field's index
1333       (<literal>Complete field (3)</literal>).
1334       </para>
1335
1336      <table id="querymodel-bib1-completeness-table"
1337       frame="all" rowsep="1" colsep="1" align="center">
1338       <caption>Completeness Attributes (type = 6)</caption>
1339       <thead>
1340         <tr>
1341          <td>Completeness</td>
1342          <td>Value</td>
1343          <td>Notes</td>
1344         </tr>
1345        </thead>
1346        <tbody>
1347         <tr>
1348          <td>Incomplete subfield</td>
1349          <td>1</td>
1350          <td>default</td>
1351         </tr>
1352         <tr>
1353          <td>Complete subfield</td>
1354          <td>2</td>
1355          <td>depreciated</td>
1356         </tr>
1357         <tr>
1358          <td>Complete field</td>
1359          <td>3</td>
1360          <td>supported</td>
1361         </tr>
1362        </tbody>
1363      </table>
1364
1365      <para>
1366       The <literal>Completeness Attributes (type = 6)</literal>
1367       is only partially and conditionally
1368       supported in the sense that it is ignored if the hit index is
1369       not of structure <literal>type="w"</literal> or
1370       <literal>type="p"</literal>.
1371       </para>
1372      <para>
1373       <literal>Incomplete subfield (1)</literal> is the default, and
1374       makes Zebra use
1375       register <literal>type="w"</literal>, whereas
1376       <literal>Complete field (3)</literal> triggers
1377       search and scan in index <literal>type="p"</literal>.
1378      </para>
1379      <para>
1380       The <literal>Complete subfield (2)</literal> is a reminiscens
1381       from the  happy <literal>MARC</literal>
1382       binary format days. Zebra does not support it, but maps silently
1383       to <literal>Complete field (3)</literal>.
1384      </para>
1385
1386      <note>
1387       The exact mapping between PQF queries and Zebra internal indexes
1388       and index types is explained in
1389        <xref linkend="querymodel-pqf-apt-mapping"/>.
1390       </note>
1391     </sect3>
1392    </sect2>
1393
1394    </sect1>
1395
1396
1397   <sect1 id="querymodel-zebra">
1398    <title>Advanced Zebra PQF Features</title>
1399    <para>
1400     The Zebra internal query engine has been extended to specific needs
1401     not covered by the <literal>bib-1</literal> attribute set query
1402     model. These extensions are <emphasis>non-standard</emphasis>
1403     and <emphasis>non-portable</emphasis>: most functional extensions
1404     are modeled over the <literal>bib-1</literal> attribute set,
1405     defining type 7-9 attributes.
1406     There are also the special
1407     <literal>string</literal> type index names for the
1408     <literal>idxpath</literal> attribute set.
1409    </para>
1410
1411    <sect2 id="querymodel-zebra-attr-allrecords">
1412     <title>Zebra specific retrieval of all records</title>
1413     <para>
1414      Zebra defines a hardwired <literal>string</literal> index name
1415      called <literal>_ALLRECORDS</literal>. It matches any record
1416      contained in the database, if used in conjunction with
1417      the relation attribute
1418      <literal>AlwaysMatches (103)</literal>.
1419      </para>
1420     <para>
1421      The <literal>_ALLRECORDS</literal> index name is used for total database
1422      export. The search term is ignored, it may be empty.
1423      <screen>
1424       Z> find @attr 1=_ALLRECORDS @attr 2=103 ""
1425      </screen>
1426     </para>
1427     <para>
1428      Combination with other index types can be made. For example, to
1429      find all records which are <emphasis>not</emphasis> indexed in
1430      the <literal>Title</literal> register, issue one of the two
1431      equivalent queries:
1432      <screen>
1433       Z> find @not @attr 1=_ALLRECORDS @attr 2=103 "" @attr 1=Title @attr 2=103 ""
1434       Z> find @not @attr 1=_ALLRECORDS @attr 2=103 "" @attr 1=4 @attr 2=103 ""
1435      </screen>
1436     </para>
1437     <warning>
1438      The special string index <literal>_ALLRECORDS</literal> is
1439      experimental, and the provided functionality and syntax may very
1440      well change in future releases of Zebra.
1441     </warning>
1442
1443    </sect2>
1444
1445    <sect2 id="querymodel-zebra-attr-search">
1446     <title>Zebra specific Search Extensions to all Attribute Sets</title>
1447     <para>
1448      Zebra extends the Bib1 attribute types, and these extensions are
1449      recognized regardless of attribute
1450      set used in a <literal>search</literal> operation query.
1451     </para>
1452
1453      <table id="querymodel-zebra-attr-search-table"
1454       frame="all" rowsep="1" colsep="1" align="center">
1455
1456       <caption>Zebra Search Attribute Extensions</caption>
1457        <thead>
1458         <tr>
1459          <td>Name</td>
1460          <td>Value</td>
1461          <td>Operation</td>
1462          <td>Zebra version</td>
1463         </tr>
1464       </thead>
1465        <tbody>
1466         <tr>
1467          <td>Embedded Sort</td>
1468          <td>7</td>
1469          <td>search</td>
1470          <td>1.1</td>
1471         </tr>
1472         <tr>
1473          <td>Term Set</td>
1474          <td>8</td>
1475          <td>search</td>
1476          <td>1.1</td>
1477         </tr>
1478         <tr>
1479          <td>Rank Weight</td>
1480          <td>9</td>
1481          <td>search</td>
1482          <td>1.1</td>
1483         </tr>
1484         <tr>
1485          <td>Approx Limit</td>
1486          <td>9</td>
1487          <td>search</td>
1488          <td>1.4</td>
1489         </tr>
1490         <tr>
1491          <td>Term Reference</td>
1492          <td>10</td>
1493          <td>search</td>
1494          <td>1.4</td>
1495         </tr>
1496        </tbody>
1497       </table>
1498
1499     <sect3 id="querymodel-zebra-attr-sorting">
1500      <title>Zebra Extension Embedded Sort Attribute (type 7)</title>
1501     </sect3>
1502     <para>
1503      The embedded sort is a way to specify sort within a query - thus
1504      removing the need to send a Sort Request separately. It is both
1505      faster and does not require clients to deal with the Sort
1506      Facility.
1507     </para>
1508
1509     <para>
1510      All ordering operations are based on a lexicographical ordering,
1511      <emphasis>expect</emphasis> when the
1512      <literal>structure attribute numeric (109)</literal> is used. In
1513      this case, ordering is numerical. See
1514       <xref linkend="querymodel-bib1-structure"/>.
1515     </para>
1516
1517     <para>
1518      The possible values after attribute <literal>type 7</literal> are
1519      <literal>1</literal> ascending and
1520      <literal>2</literal> descending.
1521      The attributes+term (APT) node is separate from the
1522      rest and must be <literal>@or</literal>'ed.
1523      The term associated with APT is the sorting level in integers,
1524      where <literal>0</literal> means primary sort,
1525      <literal>1</literal> means secondary sort, and so forth.
1526      See also <xref linkend="administration-ranking"/>.
1527     </para>
1528     <para>
1529      For example, searching for water, sort by title (ascending)
1530      <screen>
1531       Z> find @or @attr 1=1016 water @attr 7=1 @attr 1=4 0
1532      </screen>
1533     </para>
1534     <para>
1535      Or, searching for water, sort by title ascending, then date descending
1536      <screen>
1537       Z> find @or @or @attr 1=1016 water @attr 7=1 @attr 1=4 0 @attr 7=2 @attr 1=30 1
1538      </screen>
1539     </para>
1540
1541     <sect3 id="querymodel-zebra-attr-estimation">
1542      <title>Zebra Extension Term Set Attribute (type 8)</title>
1543     </sect3>
1544     <para>
1545      The Term Set feature is a facility that allows a search to store
1546      hitting terms in a "pseudo" resultset; thus a search (as usual) +
1547      a scan-like facility. Requires a client that can do named result
1548      sets since the search generates two result sets. The value for
1549      attribute 8 is the name of a result set (string). The terms in
1550      the named term set are returned as SUTRS records.
1551     </para>
1552     <para>
1553      For example, searching  for u in title, right truncated, and
1554      storing the result in term set named 'aset'
1555      <screen>
1556       Z> find @attr 5=1 @attr 1=4 @attr 8=aset u
1557      </screen>
1558     </para>
1559     <warning>
1560      The model has one serious flaw: we don't know the size of term
1561      set. Experimental. Do not use in production code.
1562     </warning>
1563
1564     <sect3 id="querymodel-zebra-attr-weight">
1565      <title>Zebra Extension Rank Weight Attribute (type 9)</title>
1566     </sect3>
1567     <para>
1568      Rank weight is a way to pass a value to a ranking algorithm - so
1569      that one APT has one value - while another as a different one.
1570      See also <xref linkend="administration-ranking"/>.
1571     </para>
1572     <para>
1573      For example, searching  for utah in title with weight 30 as well
1574      as any with weight 20:
1575      <screen>
1576       Z> find @attr 2=102 @or @attr 9=30 @attr 1=4 utah @attr 9=20 utah
1577      </screen>
1578     </para>
1579
1580     <sect3 id="querymodel-zebra-attr-limit">
1581      <title>Zebra Extension Approximative Limit Attribute (type 9)</title>
1582     </sect3>
1583     <para>
1584      Newer Zebra versions normally estimate hit count for every APT
1585      (leaf) in the query tree. These hit counts are returned as part of
1586      the searchResult-1 facility in the binary encoded Z39.50 search
1587      response packages.
1588     </para>
1589     <para>
1590      By setting a limit for the APT we can make Zebra turn into
1591      approximate hit count when a certain hit count limit is
1592      reached. A value of zero means exact hit count.
1593     </para>
1594     <para>
1595      For example, we might be interested in exact hit count for a, but
1596      for b we allow hit count estimates for 1000 and higher.
1597      <screen>
1598       Z> find @and a @attr 9=1000 b
1599      </screen>
1600     </para>
1601     <note>
1602      The estimated hit count facility makes searches faster, as one
1603      only needs to process large hit lists partially.
1604     </note>
1605     <warning>
1606      This facility clashes with rank weight, because there all
1607      documents in the hit lists need to be examined for scoring and
1608      re-sorting.
1609      It is an experimental
1610      extension. Do not use in production code.
1611     </warning>
1612
1613     <sect3 id="querymodel-zebra-attr-termref">
1614      <title>Zebra Extension Term Reference Attribute (type 10)</title>
1615     </sect3>
1616     <para>
1617      Zebra supports the <literal>searchResult-1</literal> facility.
1618      If the <literal>Term Reference Attribute (type 10)</literal> is
1619      given, that specifies a subqueryId value returned as part of the
1620      search result. It is a way for a client to name an APT part of a
1621      query.
1622     </para>
1623     <!--
1624     <para>
1625      <screen>
1626      </screen>
1627     </para>
1628     -->
1629     <warning>
1630      Experimental. Do not use in production code.
1631     </warning>
1632
1633
1634    </sect2>
1635
1636
1637    <sect2 id="querymodel-zebra-attr-scan">
1638     <title>Zebra specific Scan Extensions to all Attribute Sets</title>
1639     <para>
1640      Zebra extends the Bib1 attribute types, and these extensions are
1641      recognized regardless of attribute
1642      set used in a <literal>scan</literal> operation query.
1643     </para>
1644      <table id="querymodel-zebra-attr-scan-table"
1645       frame="all" rowsep="1" colsep="1" align="center">
1646
1647       <caption>Zebra Scan Attribute Extensions</caption>
1648        <thead>
1649         <tr>
1650          <td>Name</td>
1651          <td>Type</td>
1652          <td>Operation</td>
1653          <td>Zebra version</td>
1654         </tr>
1655       </thead>
1656        <tbody>
1657         <tr>
1658          <td>Result Set Narrow</td>
1659          <td>8</td>
1660          <td>scan</td>
1661          <td>1.3</td>
1662         </tr>
1663         <tr>
1664          <td>Approximative Limit</td>
1665          <td>9</td>
1666          <td>scan</td>
1667          <td>1.4</td>
1668         </tr>
1669        </tbody>
1670       </table>
1671
1672     <sect3 id="querymodel-zebra-attr-narrow">
1673      <title>Zebra Extension Result Set Narrow (type 8)</title>
1674     </sect3>
1675     <para>
1676      If attribute <literal>Result Set Narrow (type 8)</literal>
1677      is given for <literal>scan</literal>, the value is the name of a
1678      result set. Each hit count in <literal>scan</literal> is
1679      <literal>@and</literal>'ed with the result set given.
1680     </para>
1681     <para>
1682      Consider for example
1683      the case of scanning all title fields around the
1684      scanterm <emphasis>mozart</emphasis>, then refining the scan by
1685      issuing a filtering query for <emphasis>amadeus</emphasis> to
1686      restrict the scan to the result set of the query:
1687      <screen>
1688       Z> scan @attr 1=4 mozart
1689       ...
1690       * mozart (43)
1691         mozartforskningen (1)
1692         mozartiana (1)
1693         mozarts (16)
1694       ...
1695       Z> f @attr 1=4 amadeus
1696       ...
1697       Number of hits: 15, setno 2
1698       ...
1699       Z> scan @attr 1=4 @attr 8=2 mozart
1700       ...
1701       * mozart (14)
1702         mozartforskningen (0)
1703         mozartiana (0)
1704         mozarts (1)
1705       ...
1706      </screen>
1707     </para>
1708
1709     <warning>
1710      Experimental. Do not use in production code.
1711     </warning>
1712
1713     <sect3 id="querymodel-zebra-attr-approx">
1714      <title>Zebra Extension Approximative Limit (type 9)</title>
1715     </sect3>
1716     <para>
1717      The <literal>Zebra Extension Approximative Limit (type
1718       9)</literal> is a way to enable approximate
1719      hit counts for <literal>scan</literal> hit counts, in the same
1720      way as for <literal>search</literal> hit counts.
1721     </para>
1722     <!--
1723     <para>
1724      <screen>
1725      </screen>
1726     </para>
1727     -->
1728     <warning>
1729      Experimental and buggy. Definitely not to be used in production code.
1730     </warning>
1731
1732
1733    </sect2>
1734
1735
1736    <sect2 id="querymodel-idxpath">
1737     <title>Zebra special IDXPATH Attribute Set for GRS indexing</title>
1738     <para>
1739      The attribute-set <literal>idxpath</literal> consists of a single
1740      <literal>Use (type 1)</literal> attribute. All non-use attributes
1741      behave as normal.
1742     </para>
1743     <para>
1744      This feature is enabled when defining the
1745      <literal>xpath enable</literal> option in the GRS filter
1746      <filename>*.abs</filename> configuration files. If one wants to use
1747      the special <literal>idxpath</literal> numeric attribute set, the
1748      main Zebra configuration file <filename>zebra.cfg</filename>
1749      directive <literal>attset: idxpath.att</literal> must be enabled.
1750     </para>
1751     <warning>The <literal>idxpath</literal> is depreciated, may not be
1752      supported in future Zebra versions, and should definitely
1753      not be used in production code.
1754     </warning>
1755
1756     <sect3 id="querymodel-idxpath-use">
1757     <title>IDXPATH Use Attributes (type = 1)</title>
1758      <para>
1759       This attribute set allows one to search GRS filter indexed
1760       records by XPATH like structured index names.
1761      </para>
1762
1763      <warning>The <literal>idxpath</literal> option defines hard-coded
1764       index names, which might clash with your own index names.
1765      </warning>
1766
1767      <table id="querymodel-idxpath-use-table"
1768       frame="all" rowsep="1" colsep="1" align="center">
1769
1770       <caption>Zebra specific IDXPATH Use Attributes (type 1)</caption>
1771       <thead>
1772         <tr>
1773          <td>IDXPATH</td>
1774          <td>Value</td>
1775          <td>String Index</td>
1776          <td>Notes</td>
1777         </tr>
1778        </thead>
1779        <tbody>
1780         <tr>
1781          <td>XPATH Begin</td>
1782          <td>1</td>
1783          <td>_XPATH_BEGIN</td>
1784          <td>depreciated</td>
1785         </tr>
1786         <tr>
1787          <td>XPATH End</td>
1788          <td>2</td>
1789          <td>_XPATH_END</td>
1790          <td>depreciated</td>
1791         </tr>
1792         <tr>
1793          <td>XPATH CData</td>
1794          <td>1016</td>
1795          <td>_XPATH_CDATA</td>
1796          <td>depreciated</td>
1797         </tr>
1798         <tr>
1799          <td>XPATH Attribute Name</td>
1800          <td>3</td>
1801          <td>_XPATH_ATTR_NAME</td>
1802          <td>depreciated</td>
1803         </tr>
1804         <tr>
1805          <td>XPATH Attribute CData</td>
1806          <td>1015</td>
1807          <td>_XPATH_ATTR_CDATA</td>
1808          <td>depreciated</td>
1809         </tr>
1810        </tbody>
1811      </table>
1812
1813
1814      <para>
1815       See <filename>tab/idxpath.att</filename> for more information.
1816      </para>
1817      <para>
1818       Search for all documents starting with root element
1819       <literal>/root</literal> (either using the numeric or the string
1820       use attributes):
1821       <screen>
1822        Z> find @attrset idxpath @attr 1=1 @attr 4=3 root/
1823        Z> find @attr idxpath 1=1 @attr 4=3 root/
1824        Z> find @attr 1=_XPATH_BEGIN @attr 4=3 root/
1825       </screen>
1826      </para>
1827      <para>
1828       Search for all documents where specific nested XPATH
1829       <literal>/c1/c2/../cn</literal> exists. Notice the very
1830       counter-intuitive <emphasis>reverse</emphasis> notation!
1831       <screen>
1832        Z> find @attrset idxpath @attr 1=1 @attr 4=3 cn/cn-1/../c1/
1833        Z> find @attr 1=_XPATH_BEGIN @attr 4=3 cn/cn-1/../c1/
1834       </screen>
1835      </para>
1836      <para>
1837       Search for CDATA string <emphasis>text</emphasis> in any  element
1838       <screen>
1839        Z> find @attrset idxpath @attr 1=1016 text
1840        Z> find @attr 1=_XPATH_CDATA text
1841       </screen>
1842      </para>
1843      <para>
1844        Search for CDATA string <emphasis>anothertext</emphasis> in any
1845        attribute:
1846       <screen>
1847        Z> find @attrset idxpath @attr 1=1015 anothertext
1848        Z> find @attr 1=_XPATH_ATTR_CDATA anothertext
1849       </screen>
1850      </para>
1851      <para>
1852        Search for all documents with have an XML element node
1853        including an XML  attribute named <emphasis>creator</emphasis>
1854       <screen>
1855        Z> find @attrset idxpath @attr 1=3 @attr 4=3 creator
1856        Z> find @attr 1=_XPATH_ATTR_NAME @attr 4=3 creator
1857       </screen>
1858      </para>
1859      <para>
1860       Combining usual <literal>bib-1</literal> attribute set searches
1861       with <literal>idxpath</literal> attribute set searches:
1862       <screen>
1863        Z> find @and @attr idxpath 1=1 @attr 4=3 link/ @attr 1=4 mozart
1864        Z> find @and @attr 1=_XPATH_BEGIN @attr 4=3 link/ @attr 1=_XPATH_CDATA mozart
1865       </screen>
1866      </para>
1867      <para>
1868       Scanning is supported on all <literal>idxpath</literal>
1869       indexes, both specified as numeric use attributes, or as string
1870       index names.
1871       <screen>
1872        Z> scan  @attrset idxpath @attr 1=1016 text
1873        Z> scan  @attr 1=_XPATH_ATTR_CDATA anothertext
1874        Z> scan  @attrset idxpath @attr 1=3 @attr 4=3 ''
1875       </screen>
1876      </para>
1877
1878     </sect3>
1879    </sect2>
1880
1881
1882    <sect2 id="querymodel-pqf-apt-mapping">
1883     <title>Mapping from PQF atomic APT queries to Zebra internal
1884      register indexes</title>
1885     <para>
1886      The rules for PQF APT mapping are rather tricky to grasp in the
1887      first place. We deal first with the rules for deciding which
1888      internal register or string index to use, according to the use
1889      attribute or access point specified in the query. Thereafter we
1890      deal with the rules for determining the correct structure type of
1891      the named register.
1892     </para>
1893
1894    <sect3 id="querymodel-pqf-apt-mapping-accesspoint">
1895     <title>Mapping of PQF APT access points</title>
1896     <para>
1897       Zebra understands four fundamental different types of access
1898       points, of which only the
1899       <emphasis>numeric use attribute</emphasis> type access points
1900       are defined by the  <ulink url="&url.z39.50;">Z39.50</ulink>
1901       standard.
1902       All other access point types are Zebra specific, and non-portable.
1903     </para>
1904
1905      <table id="querymodel-zebra-mapping-accesspoint-types"
1906       frame="all" rowsep="1" colsep="1" align="center">
1907
1908       <caption>Access point name mapping</caption>
1909        <thead>
1910         <tr>
1911          <td>Access Point</td>
1912          <td>Type</td>
1913          <td>Grammar</td>
1914          <td>Notes</td>
1915         </tr>
1916       </thead>
1917       <tbody>
1918        <tr>
1919         <td>Use attribute</td>
1920         <td>numeric</td>
1921         <td>[1-9][1-9]*</td>
1922         <td>directly mapped to string index name</td>
1923        </tr>
1924        <tr>
1925         <td>String index name</td>
1926         <td>string</td>
1927         <td>[a-zA-Z](\-?[a-zA-Z0-9])*</td>
1928         <td>normalized name is used as internal string index name</td>
1929        </tr>
1930        <tr>
1931         <td>Zebra internal index name</td>
1932         <td>zebra</td>
1933         <td>_[a-zA-Z](_?[a-zA-Z0-9])*</td>
1934         <td>hardwired internal string index name</td>
1935        </tr>
1936        <tr>
1937         <td>XPATH special index</td>
1938         <td>XPath</td>
1939         <td>/.*</td>
1940         <td>special xpath search for GRS indexed records</td>
1941        </tr>
1942       </tbody>
1943     </table>
1944
1945     <para>
1946      <literal>Attribute set names</literal> and
1947      <literal>string index names</literal> are normalizes
1948      according to the following rules: all <emphasis>single</emphasis>
1949      hyphens <literal>'-'</literal> are stripped, and all upper case
1950      letters are folded to lower case.
1951      </para>
1952
1953      <para>
1954       <emphasis>Numeric use attributes</emphasis> are mapped
1955       to the Zebra internal
1956       string index according to the attribute set definition in use.
1957       The default attribute set is <literal>Bib-1</literal>, and may be
1958       omitted in the PQF query.
1959      </para>
1960
1961      <para>
1962       According to normalization and numeric
1963       use attribute mapping, it follows that the following
1964       PQF queries are considered equivalent (assuming the default
1965       configuration has not been altered):
1966       <screen>
1967       Z> find  @attr 1=Body-of-text serenade
1968       Z> find  @attr 1=bodyoftext serenade
1969       Z> find  @attr 1=BodyOfText serenade
1970       Z> find  @attr 1=bO-d-Y-of-tE-x-t serenade
1971       Z> find  @attr 1=1010 serenade
1972       Z> find  @attrset Bib-1 @attr 1=1010 serenade
1973       Z> find  @attrset bib1 @attr 1=1010 serenade
1974       Z> find  @attrset Bib1 @attr 1=1010 serenade
1975       Z> find  @attrset b-I-b-1 @attr 1=1010 serenade
1976      </screen>
1977     </para>
1978
1979     <para>
1980       The <emphasis>numerical</emphasis>
1981       <literal>use attributes (type 1)</literal>
1982       are interpreted according to the
1983       attribute sets which have been loaded in the
1984       <literal>zebra.cfg</literal> file, and are matched against specific
1985       fields as specified in the <literal>.abs</literal> file which
1986       describes the profile of the records which have been loaded.
1987       If no use attribute is provided, a default of
1988       <literal>Bib-1 Use Any (1016)</literal> is
1989       assumed.
1990       The predefined <literal>use attribute sets</literal>
1991       can be reconfigured by  tweaking the configuration files
1992       <filename>tab/*.att</filename>, and
1993       new attribute sets can be defined by adding similar files in the
1994       configuration path <literal>profilePath</literal> of the server.
1995     </para>
1996
1997      <para>
1998       <literal>String indexes</literal> can be accessed directly,
1999       independently which attribute set is in use. These are just
2000       ignored. The above mentioned name normalization applies.
2001       <literal>String index names</literal> are defined in the
2002       used indexing  filter configuration files, for example in the
2003       <literal>GRS</literal>
2004       <filename>*.abs</filename> configuration files, or in the
2005       <literal>alvis</literal> filter XSLT indexing stylesheets.
2006      </para>
2007
2008      <para>
2009       <literal>Zebra internal indexes</literal> can be accessed directly,
2010       according to the same rules as the user defined
2011       <literal>string indexes</literal>. The only difference is that
2012       <literal>Zebra internal index names</literal> are hardwired,
2013       all uppercase and
2014       must start with the character <literal>'_'</literal>.
2015      </para>
2016
2017      <para>
2018       Finally, <literal>XPATH</literal> access points are only
2019       available using the <literal>GRS</literal> filter for indexing.
2020       These access point names must start with the character
2021       <literal>'/'</literal>, they are <emphasis>not
2022       normalized</emphasis>, but passed unaltered to the Zebra internal
2023       XPATH engine. See <xref linkend="querymodel-use-xpath"/>.
2024
2025      </para>
2026
2027
2028     </sect3>
2029
2030
2031    <sect3 id="querymodel-pqf-apt-mapping-structuretype">
2032      <title>Mapping of PQF APT structure and completeness to
2033       register type</title>
2034     <para>
2035       Internally Zebra has in it's default configuration several
2036      different types of registers or indexes, whose tokenization and
2037       character normalization rules differ. This reflects the fact that
2038       searching fundamental different tokens like dates, numbers,
2039       bitfields and string based text needs different rulesets.
2040      </para>
2041
2042      <table id="querymodel-zebra-mapping-structure-types"
2043       frame="all" rowsep="1" colsep="1" align="center">
2044
2045       <caption>Structure and completeness mapping to register types</caption>
2046        <thead>
2047         <tr>
2048          <td>Structure</td>
2049          <td>Completeness</td>
2050          <td>Register type</td>
2051          <td>Notes</td>
2052         </tr>
2053       </thead>
2054       <tbody>
2055        <tr>
2056         <td>
2057           phrase (@attr 4=1), word (@attr 4=2),
2058           word-list (@attr 4=6),
2059           free-form-text  (@attr 4=105), or document-text (@attr 4=106)
2060          </td>
2061         <td>Incomplete field (@attr 6=1)</td>
2062         <td>Word ('w')</td>
2063         <td>Traditional tokenized and character normalized word index</td>
2064        </tr>
2065        <tr>
2066         <td>
2067           phrase (@attr 4=1), word (@attr 4=2),
2068           word-list (@attr 4=6),
2069           free-form-text  (@attr 4=105), or document-text (@attr 4=106)
2070          </td>
2071         <td>complete field' (@attr 6=3)</td>
2072         <td>Phrase ('p')</td>
2073         <td>Character normalized, but not tokenized index for phrase
2074           matches
2075          </td>
2076        </tr>
2077        <tr>
2078         <td>urx (@attr 4=104)</td>
2079         <td>ignored</td>
2080         <td>URX/URL ('u')</td>
2081         <td>Special index for URL web adresses</td>
2082        </tr>
2083        <tr>
2084         <td>numeric (@attr 4=109)</td>
2085         <td>ignored</td>
2086         <td>Numeric ('u')</td>
2087         <td>Special index for digital numbers</td>
2088        </tr>
2089        <tr>
2090         <td>key (@attr 4=3)</td>
2091         <td>ignored</td>
2092         <td>Null bitmap ('0')</td>
2093         <td>Used for non-tokenizated and non-normalized bit sequences</td>
2094        </tr>
2095        <tr>
2096         <td>year (@attr 4=4)</td>
2097         <td>ignored</td>
2098         <td>Year ('y')</td>
2099         <td>Non-tokenizated and non-normalized 4 digit numbers</td>
2100        </tr>
2101        <tr>
2102         <td>date (@attr 4=5)</td>
2103         <td>ignored</td>
2104         <td>Date ('d')</td>
2105         <td>Non-tokenizated and non-normalized ISO date strings</td>
2106        </tr>
2107        <tr>
2108         <td>ignored</td>
2109         <td>ignored</td>
2110         <td>Sort ('s')</td>
2111         <td>Used with special sort attribute set (@attr 7=1, @attr 7=2)</td>
2112        </tr>
2113        <tr>
2114         <td>overruled</td>
2115         <td>overruled</td>
2116         <td>special</td>
2117         <td>Internal record ID register, used whenever
2118          Relation Always Matches (@attr 2=103) is specified</td>
2119        </tr>
2120       </tbody>
2121     </table>
2122
2123      <!-- see in util/zebramap.c -->
2124
2125     <para>
2126      If a <emphasis>Structure</emphasis> attribute of
2127      <emphasis>Phrase</emphasis> is used in conjunction with a
2128      <emphasis>Completeness</emphasis> attribute of
2129      <emphasis>Complete (Sub)field</emphasis>, the term is matched
2130      against the contents of the phrase (long word) register, if one
2131      exists for the given <emphasis>Use</emphasis> attribute.
2132      A phrase register is created for those fields in the
2133      GRS <filename>*.abs</filename> file that contains a
2134      <literal>p</literal>-specifier.
2135       <screen>
2136        Z>  scan @attr 1=Title @attr 4=1 @attr 6=3 beethoven
2137        ...
2138        bayreuther festspiele (1)
2139        * beethoven bibliography database (1)
2140        benny carter (1)
2141        ...
2142        Z> find @attr 1=Title @attr 4=1 @attr 6=3 "beethoven bibliography"
2143        ...
2144        Number of hits: 0, setno 5
2145        ...
2146        Z> find @attr 1=Title @attr 4=1 @attr 6=3 "beethoven bibliography database"
2147        ...
2148        Number of hits: 1, setno 6
2149        </screen>
2150     </para>
2151
2152     <para>
2153      If <emphasis>Structure</emphasis>=<emphasis>Phrase</emphasis> is
2154      used in conjunction with <emphasis>Incomplete Field</emphasis> - the
2155      default value for <emphasis>Completeness</emphasis>, the
2156      search is directed against the normal word registers, but if the term
2157      contains multiple words, the term will only match if all of the words
2158      are found immediately adjacent, and in the given order.
2159      The word search is performed on those fields that are indexed as
2160      type <literal>w</literal> in the GRS <filename>*.abs</filename> file.
2161       <screen>
2162        Z>  scan @attr 1=Title @attr 4=1 @attr 6=1 beethoven
2163        ...
2164          beefheart (1)
2165        * beethoven (18)
2166          beethovens (7)
2167        ...
2168        Z> find @attr 1=Title @attr 4=1 @attr 6=1 beethoven
2169        ...
2170        Number of hits: 18, setno 1
2171        ...
2172        Z> find @attr 1=Title @attr 4=1 @attr 6=1 "beethoven  bibliography"
2173        ...
2174        Number of hits: 2, setno 2
2175        ...
2176      </screen>
2177     </para>
2178
2179     <para>
2180      If the <emphasis>Structure</emphasis> attribute is
2181      <emphasis>Word List</emphasis>,
2182      <emphasis>Free-form Text</emphasis>, or
2183      <emphasis>Document Text</emphasis>, the term is treated as a
2184      natural-language, relevance-ranked query.
2185      This search type uses the word register, i.e. those fields
2186      that are indexed as type <literal>w</literal> in the
2187      GRS <filename>*.abs</filename> file.
2188     </para>
2189
2190     <para>
2191      If the <emphasis>Structure</emphasis> attribute is
2192      <emphasis>Numeric String</emphasis> the term is treated as an integer.
2193      The search is performed on those fields that are indexed
2194      as type <literal>n</literal> in the GRS
2195       <filename>*.abs</filename> file.
2196     </para>
2197
2198     <para>
2199      If the <emphasis>Structure</emphasis> attribute is
2200      <emphasis>URX</emphasis> the term is treated as a URX (URL) entity.
2201      The search is performed on those fields that are indexed as type
2202      <literal>u</literal> in the <filename>*.abs</filename> file.
2203     </para>
2204
2205     <para>
2206      If the <emphasis>Structure</emphasis> attribute is
2207      <emphasis>Local Number</emphasis> the term is treated as
2208      native Zebra Record Identifier.
2209     </para>
2210
2211     <para>
2212      If the <emphasis>Relation</emphasis> attribute is
2213      <emphasis>Equals</emphasis> (default), the term is matched
2214      in a normal fashion (modulo truncation and processing of
2215      individual words, if required).
2216      If <emphasis>Relation</emphasis> is <emphasis>Less Than</emphasis>,
2217      <emphasis>Less Than or Equal</emphasis>,
2218      <emphasis>Greater than</emphasis>, or <emphasis>Greater than or
2219       Equal</emphasis>, the term is assumed to be numerical, and a
2220      standard regular expression is constructed to match the given
2221      expression.
2222      If <emphasis>Relation</emphasis> is <emphasis>Relevance</emphasis>,
2223      the standard natural-language query processor is invoked.
2224     </para>
2225
2226     <para>
2227      For the <emphasis>Truncation</emphasis> attribute,
2228      <emphasis>No Truncation</emphasis> is the default.
2229      <emphasis>Left Truncation</emphasis> is not supported.
2230      <emphasis>Process # in search term</emphasis> is supported, as is
2231      <emphasis>Regxp-1</emphasis>.
2232      <emphasis>Regxp-2</emphasis> enables the fault-tolerant (fuzzy)
2233      search. As a default, a single error (deletion, insertion,
2234      replacement) is accepted when terms are matched against the register
2235      contents.
2236     </para>
2237
2238      </sect3>
2239    </sect2>
2240
2241    <sect2  id="querymodel-regular">
2242     <title>Zebra Regular Expressions in Truncation Attribute (type = 5)</title>
2243
2244     <para>
2245      Each term in a query is interpreted as a regular expression if
2246      the truncation value is either <emphasis>Regxp-1 (@attr 5=102)</emphasis>
2247      or <emphasis>Regxp-2 (@attr 5=103)</emphasis>.
2248      Both query types follow the same syntax with the operands:
2249     </para>
2250
2251      <table id="querymodel-regular-operands-table"
2252       frame="all" rowsep="1" colsep="1" align="center">
2253
2254       <caption>Regular Expression Operands</caption>
2255        <!--
2256        <thead>
2257        <tr><td>one</td><td>two</td></tr>
2258       </thead>
2259        -->
2260        <tbody>
2261         <tr>
2262          <td><literal>x</literal></td>
2263          <td>Matches the character <literal>x</literal>.</td>
2264         </tr>
2265         <tr>
2266          <td><literal>.</literal></td>
2267          <td>Matches any character.</td>
2268         </tr>
2269         <tr>
2270          <td><literal>[ .. ]</literal></td>
2271          <td>Matches the set of characters specified;
2272          such as <literal>[abc]</literal> or <literal>[a-c]</literal>.</td>
2273         </tr>
2274        </tbody>
2275       </table>
2276
2277     <para>
2278      The above operands can be combined with the following operators:
2279     </para>
2280
2281      <table id="querymodel-regular-operators-table"
2282       frame="all" rowsep="1" colsep="1" align="center">
2283       <caption>Regular Expression Operators</caption>
2284        <!--
2285        <thead>
2286        <tr><td>one</td><td>two</td></tr>
2287       </thead>
2288        -->
2289        <tbody>
2290         <tr>
2291          <td><literal>x*</literal></td>
2292          <td>Matches <literal>x</literal> zero or more times.
2293           Priority: high.</td>
2294         </tr>
2295         <tr>
2296          <td><literal>x+</literal></td>
2297          <td>Matches <literal>x</literal> one or more times.
2298           Priority: high.</td>
2299         </tr>
2300         <tr>
2301          <td><literal>x?</literal></td>
2302          <td> Matches <literal>x</literal> zero or once.
2303           Priority: high.</td>
2304         </tr>
2305         <tr>
2306          <td><literal>xy</literal></td>
2307          <td> Matches <literal>x</literal>, then <literal>y</literal>.
2308          Priority: medium.</td>
2309         </tr>
2310         <tr>
2311          <td><literal>x|y</literal></td>
2312          <td> Matches either <literal>x</literal> or <literal>y</literal>.
2313          Priority: low.</td>
2314         </tr>
2315         <tr>
2316          <td><literal>( )</literal></td>
2317          <td>The order of evaluation may be changed by using parentheses.</td>
2318         </tr>
2319        </tbody>
2320       </table>
2321
2322     <para>
2323      If the first character of the <literal>Regxp-2</literal> query
2324      is a plus character (<literal>+</literal>) it marks the
2325      beginning of a section with non-standard specifiers.
2326      The next plus character marks the end of the section.
2327      Currently Zebra only supports one specifier, the error tolerance,
2328      which consists one digit.
2329     </para>
2330
2331     <para>
2332      Since the plus operator is normally a suffix operator the addition to
2333      the query syntax doesn't violate the syntax for standard regular
2334      expressions.
2335     </para>
2336
2337     <para>
2338      For example, a phrase search with regular expressions  in
2339      the title-register is performed like this:
2340      <screen>
2341       Z> find @attr 1=4 @attr 5=102 "informat.* retrieval"
2342      </screen>
2343     </para>
2344
2345     <para>
2346      Combinations with other attributes are possible. For example, a
2347      ranked search with a regular expression:
2348      <screen>
2349       Z> find @attr 1=4 @attr 5=102 @attr 2=102 "informat.* retrieval"
2350      </screen>
2351     </para>
2352    </sect2>
2353
2354
2355    <!--
2356    <para>
2357     The RecordType parameter in the <literal>zebra.cfg</literal> file, or
2358     the <literal>-t</literal> option to the indexer tells Zebra how to
2359     process input records.
2360     Two basic types of processing are available - raw text and structured
2361     data. Raw text is just that, and it is selected by providing the
2362     argument <literal>text</literal> to Zebra. Structured records are
2363     all handled internally using the basic mechanisms described in the
2364     subsequent sections.
2365     Zebra can read structured records in many different formats.
2366    </para>
2367    -->
2368   </sect1>
2369
2370
2371   <sect1 id="querymodel-cql-to-pqf">
2372    <title>Server Side CQL to PQF Query Translation</title>
2373    <para>
2374     Using the
2375     <literal>&lt;cql2rpn&gt;l2rpn.txt&lt;/cql2rpn&gt;</literal>
2376       YAZ Frontend Virtual
2377     Hosts option, one can configure
2378     the YAZ Frontend CQL-to-PQF
2379     converter, specifying the interpretation of various
2380     <ulink url="&url.cql;">CQL</ulink>
2381     indexes, relations, etc. in terms of Type-1 query attributes.
2382     <!-- The  yaz-client config file -->
2383    </para>
2384    <para>
2385     For example, using server-side CQL-to-PQF conversion, one might
2386     query a zebra server like this:
2387     <screen>
2388     <![CDATA[
2389      yaz-client localhost:9999
2390      Z> querytype cql
2391      Z> find text=(plant and soil)
2392      ]]>
2393     </screen>
2394      and - if properly configured - even static relevance ranking can
2395      be performed using CQL query syntax:
2396     <screen>
2397     <![CDATA[
2398      Z> find text = /relevant (plant and soil)
2399      ]]>
2400      </screen>
2401    </para>
2402
2403    <para>
2404     By the way, the same configuration can be used to
2405     search using client-side CQL-to-PQF conversion:
2406     (the only difference is <literal>querytype cql2rpn</literal>
2407     instead of
2408     <literal>querytype cql</literal>, and the call specifying a local
2409     conversion file)
2410     <screen>
2411     <![CDATA[
2412      yaz-client -q local/cql2pqf.txt localhost:9999
2413      Z> querytype cql2rpn
2414      Z> find text=(plant and soil)
2415      ]]>
2416      </screen>
2417    </para>
2418
2419    <para>
2420     Exhaustive information can be found in the
2421     Section "Specification of CQL to RPN mappings" in the YAZ manual.
2422     <ulink url="http://www.indexdata.dk/yaz/doc/tools.tkl#tools.cql.map">
2423      http://www.indexdata.dk/yaz/doc/tools.tkl#tools.cql.map</ulink>,
2424    and shall therefore not be repeated here.
2425    </para>
2426   <!--
2427   <para>
2428     See
2429       <ulink url="http://www.loc.gov/z3950/agency/zing/cql/dc-indexes.html">
2430       http://www.loc.gov/z3950/agency/zing/cql/dc-indexes.html</ulink>
2431     for the Maintenance Agency's work-in-progress mapping of Dublin Core
2432     indexes to Attribute Architecture (util, XD and BIB-2)
2433     attributes.
2434    </para>
2435    -->
2436  </sect1>
2437
2438
2439
2440 </chapter>
2441
2442  <!-- Keep this comment at the end of the file
2443  Local variables:
2444  mode: sgml
2445  sgml-omittag:t
2446  sgml-shorttag:t
2447  sgml-minimize-attributes:nil
2448  sgml-always-quote-attributes:t
2449  sgml-indent-step:1
2450  sgml-indent-data:t
2451  sgml-parent-document: "zebra.xml"
2452  sgml-local-catalogs: nil
2453  sgml-namecase-general:t
2454  End:
2455  -->