1 <chapter id="querymodel">
2 <!-- $Id: querymodel.xml,v 1.4 2006-06-14 13:44:15 adam Exp $ -->
3 <title>Query Model</title>
5 <sect1 id="querymodel-overview">
6 <title>Query Model Overview</title>
9 Zebra is born as a networking Information Retrieval engine adhering
10 to the international standards
11 <ulink url="&url.z39.50;">Z39.50</ulink> and
12 <ulink url="&url.sru;">SRU</ulink>,
13 and implement the query model defined there.
14 Unfortunately, the Z39.50 query model has only defined a binary
15 encoded representation, which is used as transport packaging in
16 the Z39.50 protocol layer. This representation is not human
17 readable, nor defines any convenient way to specify queries.
19 <!-- tell about RPN - include link to YAZ
22 Therefore, Index Data has defined a textual representation of the
23 RPN query: <literal>Prefix Query Format</literal>, short
24 <literal>PQF</literal>, which then has been adopted by other
25 parties developing Z39.50 software. It is also often referred to as
26 <literal>Prefix Query Notation</literal>, or in short
27 <literal>PQN</literal>, and is thoroughly explained in
28 <xref linkend="querymodel-pqf"/>.
31 <!-- PQF/RPN is natively supported. CQL is NOT . So we need a map -->
33 In addition, Zebra can be configured to understand and map the
34 <literal>Common Query Language</literal>
35 (<ulink url="&url.cql;">CQL</ulink>)
36 to PQF. See an introduction on the mapping to the internal query
38 <xref linkend="querymodel-cql-to-pqf"/>.
42 <sect1 id="querymodel-pqf">
43 <title>Prefix Query Format structure and syntax</title>
45 The <ulink url="&url.yaz.pqf;">PQF grammer</ulink>
46 is documented in the YAZ manual, and shall not be
47 repeated here. This textual PQF representation
48 is always during search mapped to the equivalent Zebra internal
52 <sect2 id="querymodel-pqf-tree">
53 <title>PQF tree structure</title>
55 The PQF parse tree - or the equivalent textual representation -
56 may start with one specification of the
57 <emphasis>attribute set</emphasis> used. Following is a query
59 consists of <emphasis>atomic query parts</emphasis>, eventually
60 paired by <emphasis>boolean binary operators</emphasis>, and
61 finally <emphasis>recursively combined </emphasis> into
65 <sect3 id="querymodel-attribute-sets">
66 <title>Attribute sets</title>
68 Attribute sets define the exact meaning and semantics of queries
69 issued. Zebra comes with some predefined attribute set
70 definitions, others can easily be defined and added to the
73 The Zebra internal query procesing is modeled after
74 the <literal>Bib1</literal> attribute set, and the non-use
75 attributes type 2-9 are hard-wired in. It is therefore essential
76 to be familiar with <xref linkend="querymodel-bib1"/>.
80 <table id="querymodel-attribute-sets-table">
81 <caption>Attribute sets predefined in Zebra</caption>
84 <tr><td>one</td><td>two</td></tr>
89 <td><emphasis>exp-1</emphasis></td>
90 <td><literal>Explain</literal> attribute set</td>
91 <td>Special attribute set used on the special automagic
92 <literal>IR-Explain-1</literal> database to gain information on
93 server capabilities, database names, and database
97 <td><emphasis>bib-1</emphasis></td>
98 <td><literal>Bib1</literal> attribute set</td>
99 <td>Standard PQF query language attribute set which defines the
100 semantics of Z39.50 searching. In addition, all of the
101 non-use attributes (type 2-9) define the Zebra internal query
105 <td><emphasis>gils</emphasis></td>
106 <td><literal>GILS</literal> attribute set</td>
107 <td>Extention to the <literal>Bib1</literal> attribute set.</td>
113 <sect3 id="querymodel-boolean-operators">
114 <title>Boolean operators</title>
116 A pair of subquery trees, or of atomic queries, is combined
117 using the standard boolean operators into new query trees.
120 <table id="querymodel-boolean-operators-table">
121 <caption>Boolean operators</caption>
124 <tr><td>one</td><td>two</td></tr>
128 <tr><td><emphasis>@and</emphasis></td>
129 <td>binary <literal>AND</literal> operator</td>
130 <td>Set intersection of two atomic queries hit sets</td>
132 <tr><td><emphasis>@or</emphasis></td>
133 <td>binary <literal>OR</literal> operator</td>
134 <td>Set union of two atomic queries hit sets</td>
136 <tr><td><emphasis>@not</emphasis></td>
137 <td>binary <literal>AND NOT</literal> operator</td>
138 <td>Set complement of two atomic queries hit sets</td>
140 <tr><td><emphasis>@prox</emphasis></td>
141 <td>binary <literal>PROXIMY</literal> operator</td>
142 <td>Set intersection of two atomic queries hit sets. In
143 addition, the intersection set is purged for all
144 documents which do not satisfy the requested query
145 term proximity. Usually a proper subset of the AND
152 For example, we can combine the terms
153 <emphasis>information</emphasis> and <emphasis>retrieval</emphasis>
154 into different searches in the default index of the default
155 attribute set as follows.
156 Querying for the union of all documents containing the
157 terms <emphasis>information</emphasis> OR
158 <emphasis>retrieval</emphasis>:
160 Z> find @or information retrieval
164 Querying for the intersection of all documents containing the
165 terms <emphasis>information</emphasis> AND
166 <emphasis>retrieval</emphasis>:
167 The hit set is a subset of the coresponding
170 Z> find @and information retrieval
174 Querying for the intersection of all documents containing the
175 terms <emphasis>information</emphasis> AND
176 <emphasis>retrieval</emphasis>, taking proximity into account:
177 The hit set is a subset of the coresponding
180 Z> find @prox information retrieval
184 Querying for the intersection of all documents containing the
185 terms <emphasis>information</emphasis> AND
186 <emphasis>retrieval</emphasis>, in the same order and near each
187 other as described in the term list
188 The hit set is a subset of the coresponding
191 Z> find "information retrieval"
197 <sect3 id="querymodel-atomic-queries">
198 <title>Atomic queries</title>
200 Atomic queries are the query parts which work on one acess point
201 only. These consist of <literal>an attribute list</literal>
202 followed by a <literal>single term</literal> or a
203 <literal>quoted term list</literal>.
206 Unsupplied non-use attributes type 2-9 are either inherited from
207 higher nodes in the query tree, or are set to Zebra's default values.
208 See <xref linkend="querymodel-bib1"/> for details.
211 <table id="querymodel-atomic-queries-table">
212 <caption>Atomic queries</caption>
215 <tr><td>one</td><td>two</td></tr>
219 <tr><td><emphasis>attribute list</emphasis></td>
220 <td>List of <literal>orthogonal</literal> attributes</td>
221 <td>Any of the orthogonal attribute types may be omitted,
222 these are inherited from higher query tree nodes, or if not
223 inherited, are set to the default Zebra configuration values.
226 <tr><td><emphasis>term</emphasis></td>
227 <td>single <literal>term</literal>
228 or <literal>quoted term list</literal> </td>
229 <td>Here the search terms or list of search terms is added
235 Querying for the term <emphasis>information</emphasis> in the
236 default index using the default attribite set, the server choice
237 of access point/index, and the default non-use attributes.
239 Z> find "information"
243 Equivalent query fully specified:
245 Z> find @attrset bib-1 @attr 1=1017 @attr 2=3 @attr 3=3 @attr 4=1 @attr 5=100 @attr 6=1 "information"
250 Finding all documents which have empty titles. Notice that the
251 empty term must be quoted, but is otherwise legal.
259 <sect3 id="querymodel-use-string">
260 <title>Zebra's special use attribute type 1 of form 'string'</title>
262 The numeric <literal>use (type 1)</literal> attribute is usually
263 refered to from a given
264 attribute set. In addition, Zebra let you use
265 <emphasis>any internal index
266 name defined in your configuration</emphasis>
267 as use atribute value. This is a great feature for
268 debugging, and when you do
269 not need the complecity of defined use attribute values. It is
270 the preferred way of accessing Zebra indexes directly.
273 Finding all documents which have the term list "information
274 retrieval" in an Zebra index, using it's internal full string name.
276 Z> find @attr 1=sometext "information retrieval"
280 Searching the bib-1 use attribute 54 using it's string name:
282 Z> find @attr 1=Code-language eng
286 Searching in any silly string index - if it's defined in your
287 indexation rules and can be parsed by the PQF parser.
288 This is definitely not the recommended use of
289 this facility, as it might confuse your users with some very
292 Z> find @attr 1=silly/xpath/alike[@index]/name "information retrieval"
296 See <xref linkend="querymodel-bib1-mapping"/> for details, and
297 <xref linkend="server-sru"/>
298 for the SRU PQF query extention using string names as a fast
303 <sect3 id="querymodel-use-xpath">
304 <title>Zebra's special use attribute type 1 of form 'XPath'
305 for GRS filters</title>
307 As we have seen above, it is possible (albeit seldom a great
309 <ulink url="http://www.w3.org/TR/xpath">XPath 1.0</ulink> based
310 search by defining <literal>use (type 1)</literal>
311 <emphasis>string</emphasis> attributes which in appearence
312 <emphasis>resemble XPath queries</emphasis>. There are two
313 problems with this approach: first, the XPath-look-alike has to
314 be defined at indexation time, no new undefined
315 XPath queries can entered at search time, and second, it might
316 confuse users very much that an XPath-alike index name in fact
317 gets populated from a possible entirely different XML element
318 than it pretends to acess.
321 When using the <literal>GRS Record Model</literal>
322 (see <xref linkend="record-model-grs"/>), we have the
323 possibility to embed <emphasis>life</emphasis>
325 in the PQF queries, which are here called
326 <literal>use (type 1)</literal> <emphasis>xpath</emphasis>
327 attributes. You must enable the
328 <literal>xpath enable</literal> directive in your
329 <literal>.abs</literal> config files.
332 Only a <emphasis>very</emphasis> restricted subset of the
333 <ulink url="http://www.w3.org/TR/xpath">XPath 1.0</ulink>
334 standard is supported as the GRS record model is simpler than
335 a full XML DOM structure. See the following examples for
339 Finding all documents which have the term "content"
340 inside a text node found in a specific XML DOM
341 <emphasis>subtree</emphasis>, whose starting element is
344 Z> find @attr 1=/root content
345 Z> find @attr 1=/root/first content
347 <emphasis>Notice that the
348 XPath must be absolute, i.e., must start with '/', and that the
349 XPath <literal>decendant-or-self</literal> axis followed by a
350 text node selection <literal>text()</literal> is implicitly
351 appended to the stated XPath.
353 It follows that the above searches are interpreted as:
355 Z> find @attr 1=/root//text() content
356 Z> find @attr 1=/root/first//text() content
361 Filter the adressing XPath by a predicate working on exact
363 attributes (in the XML sense) can be done: return all those docs which
364 have the term "english" contained in one of all text subnodes of
365 the subtree defined by the XPath
366 <literal>/record/title[@lang='en']</literal>
368 Z> find @attr 1=/record/title[@lang='en'] english
373 Combining numeric indexes, boolean expressions,
374 and xpath based searches is possible:
376 Z> find @attr 1=/record/title @and foo bar
377 Z> find @and @attr 1=/record/title foo @attr 1=4 bar
381 Escaping PQF keywords and other non-parseable XPath constructs
382 with <literal>'{ }'</literal> to prevent syntax errors:
384 Z> find @attr {1=/root/first[@attr='danish']} content
385 Z> find @attr {1=/root/second[@attr='danish lake']}
386 Z> find @attr {1=/root/third[@attr='dansk s\xc3\xb8']}
390 It is worth mentioning that these dynamic performed XPath
391 queries are a performance bottelneck, as no optimized
392 specialized indexes can be used. Therefore, avoid the use of
393 this facility when speed is essential, and the database content
394 size is medium to large.
400 <sect2 id="querymodel-exp1">
401 <title>Explain Attribute Set</title>
403 The Z39.50 standard defines the
404 <ulink url="&url.z39.50.explain;">Explain</ulink>attribute set
405 <literal>exp-1</literal>, which is used to discover information
406 about a server's search semantics and functional capabilities
407 Zebra exposes a "classic"
408 Explain database by base name <literal>IR-Explain-1</literal>, which
409 is populated with system internal information.
412 The attribute-set <literal>exp-1</literal> consists of a single
413 <literal>Use (type 1)</literal> attribute.
416 In addition, the non-Use
417 <literal>bib-1</literal> attributes, that is, the types
418 <literal>Relation</literal>, <literal>Position</literal>,
419 <literal>Structure</literal>, <literal>Truncation</literal>,
420 and <literal>Completeness</literal> are imported from
421 the <literal>bib-1</literal> attribute set, and may be used
422 within any explain query.
425 <sect3 id="querymodel-exp1-use">
426 <title>Use Attributes (type = 1)</title>
428 The following Explain search atributes are supported:
429 <literal>ExplainCategory</literal> (@attr 1=1),
430 <literal>DatabaseName</literal> (@attr 1=3),
431 <literal>DateAdded</literal> (@attr 1=9),
432 <literal>DateChanged</literal>(@attr 1=10).
435 A search in the use attribute <literal>ExplainCategory</literal>
436 supports only these predefined values:
437 <literal>CategoryList</literal>, <literal>TargetInfo</literal>,
438 <literal>DatabaseInfo</literal>, <literal>AttributeDetails</literal>.
441 See <filename>tab/explain.att</filename> and the
442 <ulink url="&url.z39.50;">Z39.50</ulink> standard
443 for more information.
448 <title>Explain searches with yaz-client</title>
450 Classic Explain only defines retrieval of Explain information
451 via ASN.1. Pratically no Z39.50 clients supports this. Fortunately
452 they don't have to - Zebra allows retrieval of this information
454 <literal>SUTRS</literal>, <literal>XML</literal>,
455 <literal>GRS-1</literal> and <literal>ASN.1</literal> Explain.
459 List supported categories to find out which explain commands are
463 Z> find @attr exp1 1=1 categorylist
470 Get target info, that is, investigate which databases exist at
471 this server endpoint:
474 Z> find @attr exp1 1=1 targetinfo
485 List all supported databases, the number of hits
486 is the number of databases found, which most commonly are the
488 the <literal>Default</literal> and the
489 <literal>IR-Explain-1</literal> databases.
492 Z> find @attr exp1 1=1 databaseinfo
499 Get database info record for database <literal>Default</literal>.
502 Z> find @and @attr exp1 1=1 databaseinfo @attr exp1 1=3 Default
504 Identical query with explicitly specified attribute set:
507 Z> find @attrset exp1 @and @attr 1=1 databaseinfo @attr 1=3 Default
512 Get attribute details record for database
513 <literal>Default</literal>.
514 This query is very useful to study the internal Zebra indexes.
515 If records have been indexed using the <literal>alvis</literal>
516 XSLT filter, the string representation names of the known indexes can be
520 Z> find @and @attr exp1 1=1 attributedetails @attr exp1 1=3 Default
522 Identical query with explicitly specified attribute set:
525 Z> find @attrset exp1 @and @attr 1=1 attributedetails @attr 1=3 Default
532 <sect2 id="querymodel-bib1">
533 <title>Bib1 Attribute Set</title>
535 Something about querying to be written ..
538 Most of the information contained in this section is an excerpt of
539 the <literal>ATTRIBUTE SET BIB-1 (Z39.50-1995)
541 found at <ulink url="&url.z39.50.attset.bib1.1995;">. The BIB-1
542 Attribute Set Semantics</ulink> from 1995, also in an updated
543 <ulink url="&url.z39.50.attset.bib1;">Bib-1
544 Attribute Set</ulink>
545 version from 2003. Index Data is not the copyright holder of this
550 <sect3 id="querymodel-bib1-use">
551 <title>Use Attributes (type = 1)</title>
555 Phrase search for <emphasis>information retrieval</emphasis> in
558 Z> find @attr 1=4 "information retrieval"
563 <sect3 id="querymodel-bib1-relation">
564 <title>Relation Attributes (type = 2)</title>
567 Supported operations: = (default, of omitted), < > <=, >= .
568 Unsupported: Not equal.
570 The following relation attributes are also supported: relevance (102).
571 <!-- always-matches (103) not supported for all indexes -->
573 All operations are based on a lexicographical ordering,
574 <emphasis>expect</emphasis> in the case for the
575 following structure attributes: numeric(109).
581 Ranked search for <emphasis>information retrieval</emphasis> in
583 (see <xref linkend="administration-ranking"/> for the glory details):
585 Z> find @attr 1=4 @attr 2=102 "information retrieval"
589 <sect3 id="querymodel-bib1-position">
590 <title>Position Attributes (type = 3)</title>
592 Only value of (any position(3) is supported. first in field(1),
593 and first in subfield(2) are unsupported but using them
594 does not trigger an error.
598 <sect3 id="querymodel-bib1-structure">
599 <title>Structure Attributes (type = 4)</title>
600 <!-- See tab/default.idx -->
605 the GILS schema (<literal>gils.abs</literal>), the
606 west-bounding-coordinate is indexed as type <literal>n</literal>,
607 and is therefore searched by specifying
608 <emphasis>structure</emphasis>=<emphasis>Numeric String</emphasis>.
609 To match all those records with west-bounding-coordinate greater
610 than -114 we use the following query:
612 Z> find @attr 4=109 @attr 2=5 @attr gils 1=2038 -114
616 <sect3 id="querymodel-bib1-truncation">
617 <title>Truncation Attributes (type = 5)</title>
619 Supported are: No truncation(100) which is the default,
620 Right trunation(1), Left truncation(2),
621 Left&Right truncation(3),
622 Process <literal>#</literal> in term(100) which maps
623 each # to <literal>.*</literal>,
624 Regexp-1(102) normal regular, Regexp-2(103) (regular with fuzzy),
626 Special 104, 105, 106 are deprecated and will be removed! -->
630 <sect3 id="querymodel-bib1-completeness">
631 <title>Completeness Attributes (type = 6)</title>
633 This attribute is ONLY used if structure w, p is to be
634 chosen. completeness is ignorned if not w, p is to be
636 Incomplete field(1) is the default and makes Zebra use
638 complete subfield(2) and complete field(3) both triggers
644 <sect2 id="querymodel-zebra-attr-search">
645 <title>Zebra specific Search Extentions to all Attribute Sets</title>
647 Zebra extends the Bib1 attribute types, and these extentions are
648 recognized regardless of attribute
649 set used in a <literal>search</literal> operation query.
652 <table id="querymodel-zebra-attr-search-table">
653 <caption>Zebra Search Attribute Extentions</caption>
656 <td><emphasis>Name and Type</emphasis></td>
658 <td>Zebra version</td>
663 <td><emphasis>Embedded Sort (type 7)</emphasis></td>
668 <td><emphasis>Term Set (type 8)</emphasis></td>
673 <td><emphasis>Rank weight (type 9)</emphasis></td>
678 <td><emphasis>Approx Limit (type 9)</emphasis></td>
683 <td><emphasis>Term Reference (type 10)</emphasis></td>
690 <sect3 id="querymodel-zebra-attr-sorting">
691 <title>Zebra Extention Embedded Sort Attribute (type 7)</title>
694 The embedded sort is a way to specify sort within a query - thus
695 removing the need to send a Sort Request separately. It is both
696 faster and does not require clients to deal with the Sort
700 The possible values after attribute <literal>type 7</literal> are
701 <literal>1</literal> ascending and
702 <literal>2</literal> descending.
703 The attributes+term (APT) node is separate from the
704 rest and must be <literal>@or</literal>'ed.
705 The term associated with APT is the sorting level in integers,
706 where <literal>0</literal> means primary sort,
707 <literal>1</literal> means secondary sort, and so forth.
708 See also <xref linkend="administration-ranking"/>.
711 For example, searching for water, sort by title (ascending)
713 Z> find @or @attr 1=1016 water @attr 7=1 @attr 1=4 0
717 Or, searching for water, sort by title ascending, then date descending
719 Z> find @or @or @attr 1=1016 water @attr 7=1 @attr 1=4 0 @attr 7=2 @attr 1=30 1
723 <sect3 id="querymodel-zebra-attr-estimation">
724 <title>Zebra Extention Term Set Attribute (type 8)</title>
727 The Term Set feature is a facility that allows a search to store
728 hitting terms in a "pseudo" resultset; thus a search (as usual) +
729 a scan-like facility. Requires a client that can do named result
730 sets since the search generates two result sets. The value for
731 attribute 8 is the name of a result set (string). The terms in
732 the named term set are returned as SUTRS records.
735 For example, searching for u in title, right truncated, and
736 storing the result in term set named 'aset'
738 Z> find @attr 5=1 @attr 1=4 @attr 8=aset u
742 The model has one serious flaw: we don't know the size of term
743 set. Experimental. Do not use in production code.
746 <sect3 id="querymodel-zebra-attr-weight">
747 <title>Zebra Extention Rank Weight Attribute (type 9)</title>
750 Rank weight is a way to pass a value to a ranking algorithm - so
751 that one APT has one value - while another as a different one.
752 See also <xref linkend="administration-ranking"/>.
755 For example, searching for utah in title with weight 30 as well
756 as any with weight 20:
758 Z> find @attr 2=102 @or @attr 9=30 @attr 1=4 utah @attr 9=20 utah
762 <sect3 id="querymodel-zebra-attr-limit">
763 <title>Zebra Extention Approximative Limit Attribute (type 9)</title>
766 Newer Zebra versions normally estemiates hit count for every APT
767 (leaf) in the query tree. These hit counts are returned as part of
768 the searchResult-1 facility in the binary encoded Z39.50 search
772 By setting a limit for the APT we can make Zebra turn into
773 approximate hit count when a certain hit count limit is
774 reached. A value of zero means exact hit count.
777 For example, we might be intersted in exact hit count for a, but
778 for b we allow hit count estimates for 1000 and higher.
780 Z> find @and a @attr 9=1000 b
784 The estimated hit count fascility makes searches faster, as one
785 only needs to process large hit lists partially.
788 This facility clashes with rank weight, because there all
789 documents in the hit lists need to be examined for scoring and
791 It is an experimental
792 extention. Do not use in production code.
795 <sect3 id="querymodel-zebra-attr-termref">
796 <title>Zebra Extention Term Reference Attribute (type 10)</title>
799 Zebra supports the searchResult-1 facility. If attribute 10 is
800 given, that specifies a subqueryId value returned as part of the
801 search result. It is a way for a client to name an APT part of a
811 Experimental. Do not use in production code.
818 <sect2 id="querymodel-zebra-attr-scan">
819 <title>Zebra specific Scan Extentions to all Attribute Sets</title>
821 Zebra extends the Bib1 attribute types, and these extentions are
822 recognized regardless of attribute
823 set used in a <literal>scan</literal> operation query.
825 <table id="querymodel-zebra-attr-scan-table">
826 <caption>Zebra Scan Attribute Extentions</caption>
829 <td><emphasis>Name and Type</emphasis></td>
831 <td>Zebra version</td>
836 <td><emphasis>Result Set Narrow (type 8)</emphasis></td>
841 <td><emphasis>Approximative Limit (type 9)</emphasis></td>
848 <sect3 id="querymodel-zebra-attr-xyz">
849 <title>Zebra Extention Result Set Narrow (type 8)</title>
852 If attribute 8 is given for scan, the value is the name of a
853 result set. Each hit count in scan is @and'ed with the result set
863 Experimental and buggy. Definitely not to be used in production code.
866 <sect3 id="querymodel-zebra-attr-xyz">
867 <title>Zebra Extention Approximative Limit (type 9)</title>
870 The approximative limit (as for search) is a way to enable approx
871 hit counts for scan hit counts.
880 Experimental. Do not use in production code.
887 <sect2 id="querymodel-bib1-mapping">
888 <title>Mapping from Bib1 Attributes to Zebra internal
889 register indexes</title>
895 <!-- see in util/zebramap.c
898 if (completeness_value == 2 || completeness_value == 3)
904 *sort_flag =(sort_relation_value > 0) ? 1 : 0;
905 *search_type = "phrase";
906 strcpy(rank_type, "void");
907 if (relation_value == 102)
909 if (weight_value == -1)
911 sprintf(rank_type, "rank,w=%d,u=%d", weight_value, use_value);
913 if (relation_value == 103)
915 *search_type = "always";
923 switch (structure_value)
925 case 6: /* word list */
926 *search_type = "and-list";
928 case 105: /* free-form-text */
929 *search_type = "or-list";
931 case 106: /* document-text */
932 *search_type = "or-list";
937 case 108: /* string */
938 *search_type = "phrase";
940 case 107: /* local-number */
941 *search_type = "local";
944 case 109: /* numeric string */
946 *search_type = "numeric";
950 *search_type = "phrase";
954 *search_type = "phrase";
958 *search_type = "phrase";
962 *search_type = "phrase";
973 <emphasis>Use</emphasis> attributes are interpreted according to the
974 attribute sets which have been loaded in the
975 <literal>zebra.cfg</literal> file, and are matched against specific
976 fields as specified in the <literal>.abs</literal> file which
977 describes the profile of the records which have been loaded.
978 If no Use attribute is provided, a default of Bib-1 Any is assumed.
982 If a <emphasis>Structure</emphasis> attribute of
983 <emphasis>Phrase</emphasis> is used in conjunction with a
984 <emphasis>Completeness</emphasis> attribute of
985 <emphasis>Complete (Sub)field</emphasis>, the term is matched
986 against the contents of the phrase (long word) register, if one
987 exists for the given <emphasis>Use</emphasis> attribute.
988 A phrase register is created for those fields in the
989 <literal>.abs</literal> file that contains a
990 <literal>p</literal>-specifier.
991 <!-- ### whatever the hell _that_ is -->
995 If <emphasis>Structure</emphasis>=<emphasis>Phrase</emphasis> is
996 used in conjunction with <emphasis>Incomplete Field</emphasis> - the
997 default value for <emphasis>Completeness</emphasis>, the
998 search is directed against the normal word registers, but if the term
999 contains multiple words, the term will only match if all of the words
1000 are found immediately adjacent, and in the given order.
1001 The word search is performed on those fields that are indexed as
1002 type <literal>w</literal> in the <literal>.abs</literal> file.
1006 If the <emphasis>Structure</emphasis> attribute is
1007 <emphasis>Word List</emphasis>,
1008 <emphasis>Free-form Text</emphasis>, or
1009 <emphasis>Document Text</emphasis>, the term is treated as a
1010 natural-language, relevance-ranked query.
1011 This search type uses the word register, i.e. those fields
1012 that are indexed as type <literal>w</literal> in the
1013 <literal>.abs</literal> file.
1017 If the <emphasis>Structure</emphasis> attribute is
1018 <emphasis>Numeric String</emphasis> the term is treated as an integer.
1019 The search is performed on those fields that are indexed
1020 as type <literal>n</literal> in the <literal>.abs</literal> file.
1024 If the <emphasis>Structure</emphasis> attribute is
1025 <emphasis>URx</emphasis> the term is treated as a URX (URL) entity.
1026 The search is performed on those fields that are indexed as type
1027 <literal>u</literal> in the <literal>.abs</literal> file.
1031 If the <emphasis>Structure</emphasis> attribute is
1032 <emphasis>Local Number</emphasis> the term is treated as
1033 native Zebra Record Identifier.
1037 If the <emphasis>Relation</emphasis> attribute is
1038 <emphasis>Equals</emphasis> (default), the term is matched
1039 in a normal fashion (modulo truncation and processing of
1040 individual words, if required).
1041 If <emphasis>Relation</emphasis> is <emphasis>Less Than</emphasis>,
1042 <emphasis>Less Than or Equal</emphasis>,
1043 <emphasis>Greater than</emphasis>, or <emphasis>Greater than or
1044 Equal</emphasis>, the term is assumed to be numerical, and a
1045 standard regular expression is constructed to match the given
1047 If <emphasis>Relation</emphasis> is <emphasis>Relevance</emphasis>,
1048 the standard natural-language query processor is invoked.
1052 For the <emphasis>Truncation</emphasis> attribute,
1053 <emphasis>No Truncation</emphasis> is the default.
1054 <emphasis>Left Truncation</emphasis> is not supported.
1055 <emphasis>Process # in search term</emphasis> is supported, as is
1056 <emphasis>Regxp-1</emphasis>.
1057 <emphasis>Regxp-2</emphasis> enables the fault-tolerant (fuzzy)
1058 search. As a default, a single error (deletion, insertion,
1059 replacement) is accepted when terms are matched against the register
1064 <sect2 id="querymodel-regular">
1065 <title>Zebra Regular Expressions in Truncation Attribute (type = 5)</title>
1068 Each term in a query is interpreted as a regular expression if
1069 the truncation value is either <emphasis>Regxp-1 (@attr 5=102)</emphasis>
1070 or <emphasis>Regxp-2 (@attr 5=103)</emphasis>.
1071 Both query types follow the same syntax with the operands:
1074 <table id="querymodel-regular-operands-table">
1075 <caption>Regular Expression Operands</caption>
1078 <tr><td>one</td><td>two</td></tr>
1083 <td><emphasis>x</emphasis></td>
1084 <td>Matches the character <emphasis>x</emphasis>.</td>
1087 <td><emphasis>.</emphasis></td>
1088 <td>Matches any character.</td>
1091 <td><emphasis>[ .. ]</emphasis></td>
1092 <td>Matches the set of characters specified;
1093 such as <literal>[abc]</literal> or <literal>[a-c]</literal>.</td>
1099 The above operands can be combined with the following operators:
1103 <table id="querymodel-regular-operators-table">
1104 <caption>Regular Expression Operators</caption>
1107 <tr><td>one</td><td>two</td></tr>
1112 <td><emphasis>x*</emphasis></td>
1113 <td>Matches <emphasis>x</emphasis> zero or more times.
1114 Priority: high.</td>
1117 <td><emphasis>x+</emphasis></td>
1118 <td>Matches <emphasis>x</emphasis> one or more times.
1119 Priority: high.</td>
1122 <td><emphasis>x?</emphasis></td>
1123 <td> Matches <emphasis>x</emphasis> zero or once.
1124 Priority: high.</td>
1127 <td><emphasis>xy</emphasis></td>
1128 <td> Matches <emphasis>x</emphasis>, then <emphasis>y</emphasis>.
1129 Priority: medium.</td>
1132 <td><emphasis>x|y</emphasis></td>
1133 <td> Matches either <emphasis>x</emphasis> or <emphasis>y</emphasis>.
1137 <td><emphasis>( )</emphasis></td>
1138 <td>The order of evaluation may be changed by using parentheses.</td>
1144 If the first character of the <emphasis>Regxp-2</emphasis> query
1145 is a plus character (<literal>+</literal>) it marks the
1146 beginning of a section with non-standard specifiers.
1147 The next plus character marks the end of the section.
1148 Currently Zebra only supports one specifier, the error tolerance,
1149 which consists one digit.
1153 Since the plus operator is normally a suffix operator the addition to
1154 the query syntax doesn't violate the syntax for standard regular
1159 For example, a phrase search with regular expressions in
1160 the title-register is performed like this:
1162 Z> find @attr 1=4 @attr 5=102 "informat.* retrieval"
1167 Combinations with other attributes are possible. For example, a
1168 ranked search with a regular expression
1169 (see <xref linkend="administration-ranking"/> for the glory details):
1171 Z> find @attr 1=4 @attr 5=102 @attr 2=102 "informat.* retrieval"
1179 The RecordType parameter in the <literal>zebra.cfg</literal> file, or
1180 the <literal>-t</literal> option to the indexer tells Zebra how to
1181 process input records.
1182 Two basic types of processing are available - raw text and structured
1183 data. Raw text is just that, and it is selected by providing the
1184 argument <emphasis>text</emphasis> to Zebra. Structured records are
1185 all handled internally using the basic mechanisms described in the
1186 subsequent sections.
1187 Zebra can read structured records in many different formats.
1193 <sect1 id="querymodel-cql-to-pqf">
1194 <title>Server Side CQL to PQF Query Translation</title>
1197 <literal><cql2rpn>l2rpn.txt</cql2rpn></literal>
1198 YAZ Frontend Virtual
1199 Hosts option, one can configure
1200 the YAZ Frontend CQL-to-PQF
1201 converter, specifying the interpretation of various
1202 <ulink url="&url.cql;">CQL</ulink>
1203 indexes, relations, etc. in terms of Type-1 query attributes.
1204 <!-- The yaz-client config file -->
1207 For example, using server-side CQL-to-PQF conversion, one might
1208 query a zebra server like this:
1211 yaz-client localhost:9999
1213 Z> find text=(plant and soil)
1216 and - if properly configured - even static relevance ranking can
1217 be performed using CQL query syntax:
1220 Z> find text = /relevant (plant and soil)
1226 By the way, the same configuration can be used to
1227 search using client-side CQL-to-PQF conversion:
1228 (the only difference is <literal>querytype cql2rpn</literal>
1230 <literal>querytype cql</literal>, and the call specifying a local
1234 yaz-client -q local/cql2pqf.txt localhost:9999
1235 Z> querytype cql2rpn
1236 Z> find text=(plant and soil)
1242 Exhaustive information can be found in the
1243 Section "Specification of CQL to RPN mappings" in the YAZ manual.
1244 <ulink url="http://www.indexdata.dk/yaz/doc/tools.tkl#tools.cql.map">
1245 http://www.indexdata.dk/yaz/doc/tools.tkl#tools.cql.map</ulink>,
1246 and shall therefore not be repeated here.
1251 <ulink url="http://www.loc.gov/z3950/agency/zing/cql/dc-indexes.html">
1252 http://www.loc.gov/z3950/agency/zing/cql/dc-indexes.html</ulink>
1253 for the Maintenance Agency's work-in-progress mapping of Dublin Core
1254 indexes to Attribute Architecture (util, XD and BIB-2)
1264 <!-- Keep this comment at the end of the file
1269 sgml-minimize-attributes:nil
1270 sgml-always-quote-attributes:t
1273 sgml-parent-document: "zebra.xml"
1274 sgml-local-catalogs: nil
1275 sgml-namecase-general:t