1 <chapter id="querymodel">
2 <!-- $Id: querymodel.xml,v 1.3 2006-06-14 12:20:06 marc Exp $ -->
3 <title>Query Model</title>
5 <sect1 id="querymodel-overview">
6 <title>Query Model Overview</title>
9 Zebra is born as a networking Information Retrieval engine adhering
10 to the international standards
11 <ulink url="&url.z39.50;">Z39.50</ulink> and
12 <ulink url="&url.sru;">SRU</ulink>,
13 and implement the query model defined there.
14 Unfortunately, the Z39.50 query model has only defined a binary
15 encoded representation, which is used as transport packaging in
16 the Z39.50 protocol layer. This representation is not human
17 readable, nor defines any convenient way to specify queries.
20 Therefore, Index Data has defined a textual representaion in the
21 <literal>Prefix Query Format</literal>, short
22 <literal>PQF</literal>, which then has been adopted by other
23 parties developing Z39.50 software. It is also often referred to as
24 <literal>Prefix Query Notation</literal>, or in short
25 <literal>PQN</literal>, and is thoroughly explained in
26 <xref linkend="querymodel-pqf"/>.
30 In addition, Zebra can be configured to understand and map the
31 <literal>Common Query Language</literal>
32 (<ulink url="&url.cql;">CQL</ulink>)
33 to PQF. See an introduction on the mapping to the internal query
35 <xref linkend="querymodel-cql-to-pqf"/>.
39 <sect1 id="querymodel-pqf">
40 <title>Prefix Query Format structure and syntax</title>
42 The <ulink url="&url.yaz.pqf;">PQF grammer</ulink>
43 is documented in the YAZ manual, and shall not be
44 repeated here. This textual PQF representation
45 is always during search mapped to the equivalent Zebra internal
49 <sect2 id="querymodel-pqf-tree">
50 <title>PQF tree structure</title>
52 The PQF parse tree - or the equivalent textual representation -
53 may start with one specification of the
54 <emphasis>attribute set</emphasis> used. Following is a query
56 consists of <emphasis>atomic query parts</emphasis>, eventually
57 paired by <emphasis>boolean binary operators</emphasis>, and
58 finally <emphasis>recursively combined </emphasis> into
62 <sect3 id="querymodel-attribute-sets">
63 <title>Attribute sets</title>
65 Attribute sets define the exact meaning and semantics of queries
66 issued. Zebra comes with some predefined attribute set
67 definitions, others can easily be defined and added to the
70 The Zebra internal query procesing is modeled after
71 the <literal>Bib1</literal> attribute set, and the non-use
72 attributes type 2-9 are hard-wired in. It is therefore essential
73 to be familiar with <xref linkend="querymodel-bib1"/>.
77 <table id="querymodel-attribute-sets-table">
78 <caption>Attribute sets predefined in Zebra</caption>
81 <tr><td>one</td><td>two</td></tr>
86 <td><emphasis>exp-1</emphasis></td>
87 <td><literal>Explain</literal> attribute set</td>
88 <td>Special attribute set used on the special automagic
89 <literal>IR-Explain-1</literal> database to gain information on
90 server capabilities, database names, and database
94 <td><emphasis>bib-1</emphasis></td>
95 <td><literal>Bib1</literal> attribute set</td>
96 <td>Standard PQF query language attribute set which defines the
97 semantics of Z39.50 searching. In addition, all of the
98 non-use attributes (type 2-9) define the Zebra internal query
102 <td><emphasis>gils</emphasis></td>
103 <td><literal>GILS</literal> attribute set</td>
104 <td>Extention to the <literal>Bib1</literal> attribute set.</td>
110 <sect3 id="querymodel-boolean-operators">
111 <title>Boolean operators</title>
113 A pair of subquery trees, or of atomic queries, is combined
114 using the standard boolean operators into new query trees.
117 <table id="querymodel-boolean-operators-table">
118 <caption>Boolean operators</caption>
121 <tr><td>one</td><td>two</td></tr>
125 <tr><td><emphasis>@and</emphasis></td>
126 <td>binary <literal>AND</literal> operator</td>
127 <td>Set intersection of two atomic queries hit sets</td>
129 <tr><td><emphasis>@or</emphasis></td>
130 <td>binary <literal>OR</literal> operator</td>
131 <td>Set union of two atomic queries hit sets</td>
133 <tr><td><emphasis>@not</emphasis></td>
134 <td>binary <literal>AND NOT</literal> operator</td>
135 <td>Set complement of two atomic queries hit sets</td>
137 <tr><td><emphasis>@prox</emphasis></td>
138 <td>binary <literal>PROXIMY</literal> operator</td>
139 <td>Set intersection of two atomic queries hit sets. In
140 addition, the intersection set is purged for all
141 documents which do not satisfy the requested query
142 term proximity. Usually a proper subset of the AND
149 For example, we can combine the terms
150 <emphasis>information</emphasis> and <emphasis>retrieval</emphasis>
151 into different searches in the default index of the default
152 attribute set as follows.
153 Querying for the union of all documents containing the
154 terms <emphasis>information</emphasis> OR
155 <emphasis>retrieval</emphasis>:
157 Z> find @or information retrieval
161 Querying for the intersection of all documents containing the
162 terms <emphasis>information</emphasis> AND
163 <emphasis>retrieval</emphasis>:
164 The hit set is a subset of the coresponding
167 Z> find @and information retrieval
171 Querying for the intersection of all documents containing the
172 terms <emphasis>information</emphasis> AND
173 <emphasis>retrieval</emphasis>, taking proximity into account:
174 The hit set is a subset of the coresponding
177 Z> find @prox information retrieval
181 Querying for the intersection of all documents containing the
182 terms <emphasis>information</emphasis> AND
183 <emphasis>retrieval</emphasis>, in the same order and near each
184 other as described in the term list
185 The hit set is a subset of the coresponding
188 Z> find "information retrieval"
194 <sect3 id="querymodel-atomic-queries">
195 <title>Atomic queries</title>
197 Atomic queries are the query parts which work on one acess point
198 only. These consist of <literal>an attribute list</literal>
199 followed by a <literal>single term</literal> or a
200 <literal>quoted term list</literal>.
203 Unsupplied non-use attributes type 2-9 are either inherited from
204 higher nodes in the query tree, or are set to Zebra's default values.
205 See <xref linkend="querymodel-bib1"/> for details.
208 <table id="querymodel-atomic-queries-table">
209 <caption>Atomic queries</caption>
212 <tr><td>one</td><td>two</td></tr>
216 <tr><td><emphasis>attribute list</emphasis></td>
217 <td>List of <literal>orthogonal</literal> attributes</td>
218 <td>Any of the orthogonal attribute types may be omitted,
219 these are inherited from higher query tree nodes, or if not
220 inherited, are set to the default Zebra configuration values.
223 <tr><td><emphasis>term</emphasis></td>
224 <td>single <literal>term</literal>
225 or <literal>quoted term list</literal> </td>
226 <td>Here the search terms or list of search terms is added
232 Querying for the term <emphasis>information</emphasis> in the
233 default index using the default attribite set, the server choice
234 of access point/index, and the default non-use attributes.
236 Z> find "information"
240 Equivalent query fully specified:
242 Z> find @attrset bib-1 @attr 1=1017 @attr 2=3 @attr 3=3 @attr 4=1 @attr 5=100 @attr 6=1 "information"
247 Finding all documents which have empty titles. Notice that the
248 empty term must be quoted, but is otherwise legal.
256 <sect3 id="querymodel-use-string">
257 <title>Zebra's special use attribute type 1 of form 'string'</title>
259 The numeric <literal>use (type 1)</literal> attribute is usually
260 refered to from a given
261 attribute set. In addition, Zebra let you use
262 <emphasis>any internal index
263 name defined in your configuration</emphasis>
264 as use atribute value. This is a great feature for
265 debugging, and when you do
266 not need the complecity of defined use attribute values. It is
267 the preferred way of accessing Zebra indexes directly.
270 Finding all documents which have the term list "information
271 retrieval" in an Zebra index, using it's internal full string name.
273 Z> find @attr 1=sometext "information retrieval"
277 Searching the bib-1 use attribute 54 using it's string name:
279 Z> find @attr 1=Code-language eng
283 Searching in any silly string index - if it's defined in your
284 indexation rules and can be parsed by the PQF parser.
285 This is definitely not the recommended use of
286 this facility, as it might confuse your users with some very
289 Z> find @attr 1=silly/xpath/alike[@index]/name "information retrieval"
293 See <xref linkend="querymodel-bib1-mapping"/> for details, and
294 <xref linkend="server-sru"/>
295 for the SRU PQF query extention using string names as a fast
300 <sect3 id="querymodel-use-xpath">
301 <title>Zebra's special use attribute type 1 of form 'XPath'
302 for GRS filters</title>
304 As we have seen above, it is possible (albeit seldom a great
306 <ulink url="http://www.w3.org/TR/xpath">XPath 1.0</ulink> based
307 search by defining <literal>use (type 1)</literal>
308 <emphasis>string</emphasis> attributes which in appearence
309 <emphasis>resemble XPath queries</emphasis>. There are two
310 problems with this approach: first, the XPath-look-alike has to
311 be defined at indexation time, no new undefined
312 XPath queries can entered at search time, and second, it might
313 confuse users very much that an XPath-alike index name in fact
314 gets populated from a possible entirely different XML element
315 than it pretends to acess.
318 When using the <literal>GRS Record Model</literal>
319 (see <xref linkend="record-model-grs"/>), we have the
320 possibility to embed <emphasis>life</emphasis>
322 in the PQF queries, which are here called
323 <literal>use (type 1)</literal> <emphasis>xpath</emphasis>
324 attributes. You must enable the
325 <literal>xpath enable</literal> directive in your
326 <literal>.abs</literal> config files.
329 Only a <emphasis>very</emphasis> restricted subset of the
330 <ulink url="http://www.w3.org/TR/xpath">XPath 1.0</ulink>
331 standard is supported as the GRS record model is simpler than
332 a full XML DOM structure. See the following examples for
336 Finding all documents which have the term "content"
337 inside a text node found in a specific XML DOM
338 <emphasis>subtree</emphasis>, whose starting element is
341 Z> find @attr 1=/root content
342 Z> find @attr 1=/root/first content
344 <emphasis>Notice that the
345 XPath must be absolute, i.e., must start with '/', and that the
346 XPath <literal>decendant-or-self</literal> axis followed by a
347 text node selection <literal>text()</literal> is implicitly
348 appended to the stated XPath.
350 It follows that the above searches are interpreted as:
352 Z> find @attr 1=/root//text() content
353 Z> find @attr 1=/root/first//text() content
358 Filter the adressing XPath by a predicate working on exact
360 attributes (in the XML sense) can be done: return all those docs which
361 have the term "english" contained in one of all text subnodes of
362 the subtree defined by the XPath
363 <literal>/record/title[@lang='en']</literal>
365 Z> find @attr 1=/record/title[@lang='en'] english
370 Combining numeric indexes, boolean expressions,
371 and xpath based searches is possible:
373 Z> find @attr 1=/record/title @and foo bar
374 Z> find @and @attr 1=/record/title foo @attr 1=4 bar
378 Escaping PQF keywords and other non-parseable XPath constructs
379 with <literal>'{ }'</literal> to prevent syntax errors:
381 Z> find @attr {1=/root/first[@attr='danish']} content
382 Z> find @attr {1=/root/second[@attr='danish lake']}
383 Z> find @attr {1=/root/third[@attr='dansk s\xc3\xb8']}
387 It is worth mentioning that these dynamic performed XPath
388 queries are a performance bottelneck, as no optimized
389 specialized indexes can be used. Therefore, avoid the use of
390 this facility when speed is essential, and the database content
391 size is medium to large.
397 <sect2 id="querymodel-exp1">
398 <title>Explain Attribute Set</title>
400 The Z39.50 standard defines the
401 <ulink url="&url.z39.50.explain;">Explain</ulink>attribute set
402 <literal>exp-1</literal>, which is used to discover information
403 about a server's search semantics and functional capabilities
404 Zebra exposes a "classic"
405 Explain database by base name <literal>IR-Explain-1</literal>, which
406 is populated with system internal information.
409 The attribute-set <literal>exp-1</literal> consists of a single
410 <literal>Use (type 1)</literal> attribute.
413 In addition, the non-Use
414 <literal>bib-1</literal> attributes, that is, the types
415 <literal>Relation</literal>, <literal>Position</literal>,
416 <literal>Structure</literal>, <literal>Truncation</literal>,
417 and <literal>Completeness</literal> are imported from
418 the <literal>bib-1</literal> attribute set, and may be used
419 within any explain query.
422 <sect3 id="querymodel-exp1-use">
423 <title>Use Attributes (type = 1)</title>
425 The following Explain search atributes are supported:
426 <literal>ExplainCategory</literal> (@attr 1=1),
427 <literal>DatabaseName</literal> (@attr 1=3),
428 <literal>DateAdded</literal> (@attr 1=9),
429 <literal>DateChanged</literal>(@attr 1=10).
432 A search in the use attribute <literal>ExplainCategory</literal>
433 supports only these predefined values:
434 <literal>CategoryList</literal>, <literal>TargetInfo</literal>,
435 <literal>DatabaseInfo</literal>, <literal>AttributeDetails</literal>.
438 See <filename>tab/explain.att</filename> and the
439 <ulink url="&url.z39.50;">Z39.50</ulink> standard
440 for more information.
445 <title>Explain searches with yaz-client</title>
447 Classic Explain only defines retrieval of Explain information
448 via ASN.1. Pratically no Z39.50 clients supports this. Fortunately
449 they don't have to - Zebra allows retrieval of this information
451 <literal>SUTRS</literal>, <literal>XML</literal>,
452 <literal>GRS-1</literal> and <literal>ASN.1</literal> Explain.
456 List supported categories to find out which explain commands are
460 Z> find @attr exp1 1=1 categorylist
467 Get target info, that is, investigate which databases exist at
468 this server endpoint:
471 Z> find @attr exp1 1=1 targetinfo
482 List all supported databases, the number of hits
483 is the number of databases found, which most commonly are the
485 the <literal>Default</literal> and the
486 <literal>IR-Explain-1</literal> databases.
489 Z> find @attr exp1 1=1 databaseinfo
496 Get database info record for database <literal>Default</literal>.
499 Z> find @and @attr exp1 1=1 databaseinfo @attr exp1 1=3 Default
501 Identical query with explicitly specified attribute set:
504 Z> find @attrset exp1 @and @attr 1=1 databaseinfo @attr 1=3 Default
509 Get attribute details record for database
510 <literal>Default</literal>.
511 This query is very useful to study the internal Zebra indexes.
512 If records have been indexed using the <literal>alvis</literal>
513 XSLT filter, the string representation names of the known indexes can be
517 Z> find @and @attr exp1 1=1 attributedetails @attr exp1 1=3 Default
519 Identical query with explicitly specified attribute set:
522 Z> find @attrset exp1 @and @attr 1=1 attributedetails @attr 1=3 Default
529 <sect2 id="querymodel-bib1">
530 <title>Bib1 Attribute Set</title>
532 Something about querying to be written ..
535 Most of the information contained in this section is an excerpt of
536 the <literal>ATTRIBUTE SET BIB-1 (Z39.50-1995)
538 found at <ulink url="&url.z39.50.attset.bib1.1995;">. The BIB-1
539 Attribute Set Semantics</ulink> from 1995, also in an updated
540 <ulink url="&url.z39.50.attset.bib1;">Bib-1
541 Attribute Set</ulink>
542 version from 2003. Index Data is not the copyright holder of this
547 <sect3 id="querymodel-bib1-use">
548 <title>Use Attributes (type = 1)</title>
552 Phrase search for <emphasis>information retrieval</emphasis> in
555 Z> find @attr 1=4 "information retrieval"
560 <sect3 id="querymodel-bib1-relation">
561 <title>Relation Attributes (type = 2)</title>
567 Ranked search for <emphasis>information retrieval</emphasis> in
569 (see <xref linkend="administration-ranking"/> for the glory details):
571 Z> find @attr 1=4 @attr 2=102 "information retrieval"
575 <sect3 id="querymodel-bib1-position">
576 <title>Position Attributes (type = 3)</title>
579 <sect3 id="querymodel-bib1-structure">
580 <title>Structure Attributes (type = 4)</title>
586 the GILS schema (<literal>gils.abs</literal>), the
587 west-bounding-coordinate is indexed as type <literal>n</literal>,
588 and is therefore searched by specifying
589 <emphasis>structure</emphasis>=<emphasis>Numeric String</emphasis>.
590 To match all those records with west-bounding-coordinate greater
591 than -114 we use the following query:
593 Z> find @attr 4=109 @attr 2=5 @attr gils 1=2038 -114
597 <sect3 id="querymodel-bib1-truncation">
598 <title>Truncation Attributes (type = 5)</title>
601 <sect3 id="querymodel-bib1-completeness">
602 <title>Completeness Attributes (type = 6)</title>
607 <sect2 id="querymodel-zebra-attr-search">
608 <title>Zebra specific Search Extentions to all Attribute Sets</title>
610 Zebra extends the Bib1 attribute types, and these extentions are
611 recognized regardless of attribute
612 set used in a <literal>search</literal> operation query.
615 <table id="querymodel-zebra-attr-search-table">
616 <caption>Zebra Search Attribute Extentions</caption>
619 <td><emphasis>Name and Type</emphasis></td>
621 <td>Zebra version</td>
626 <td><emphasis>Embedded Sort (type 7)</emphasis></td>
631 <td><emphasis>Term Set (type 8)</emphasis></td>
636 <td><emphasis>Rank weight (type 9)</emphasis></td>
641 <td><emphasis>Approx Limit (type 9)</emphasis></td>
646 <td><emphasis>Term Reference (type 10)</emphasis></td>
653 <sect3 id="querymodel-zebra-attr-sorting">
654 <title>Zebra Extention Embedded Sort Attribute (type 7)</title>
657 The embedded sort is a way to specify sort within a query - thus
658 removing the need to send a Sort Request separately. It is both
659 faster and does not require clients to deal with the Sort
663 The possible values after attribute <literal>type 7</literal> are
664 <literal>1</literal> ascending and
665 <literal>2</literal> descending.
666 The attributes+term (APT) node is separate from the
667 rest and must be <literal>@or</literal>'ed.
668 The term associated with APT is the sorting level in integers,
669 where <literal>0</literal> means primary sort,
670 <literal>1</literal> means secondary sort, and so forth.
671 See also <xref linkend="administration-ranking"/>.
674 For example, searching for water, sort by title (ascending)
676 Z> find @or @attr 1=1016 water @attr 7=1 @attr 1=4 0
680 Or, searching for water, sort by title ascending, then date descending
682 Z> find @or @or @attr 1=1016 water @attr 7=1 @attr 1=4 0 @attr 7=2 @attr 1=30 1
686 <sect3 id="querymodel-zebra-attr-estimation">
687 <title>Zebra Extention Term Set Attribute (type 8)</title>
690 The Term Set feature is a facility that allows a search to store
691 hitting terms in a "pseudo" resultset; thus a search (as usual) +
692 a scan-like facility. Requires a client that can do named result
693 sets since the search generates two result sets. The value for
694 attribute 8 is the name of a result set (string). The terms in
695 the named term set are returned as SUTRS records.
698 For example, searching for u in title, right truncated, and
699 storing the result in term set named 'aset'
701 Z> find @attr 5=1 @attr 1=4 @attr 8=aset u
705 The model has one serious flaw: we don't know the size of term
706 set. Experimental. Do not use in production code.
709 <sect3 id="querymodel-zebra-attr-weight">
710 <title>Zebra Extention Rank Weight Attribute (type 9)</title>
713 Rank weight is a way to pass a value to a ranking algorithm - so
714 that one APT has one value - while another as a different one.
715 See also <xref linkend="administration-ranking"/>.
718 For example, searching for utah in title with weight 30 as well
719 as any with weight 20:
721 Z> find @attr 2=102 @or @attr 9=30 @attr 1=4 utah @attr 9=20 utah
725 <sect3 id="querymodel-zebra-attr-limit">
726 <title>Zebra Extention Approximative Limit Attribute (type 9)</title>
729 Newer Zebra versions normally estemiates hit count for every APT
730 (leaf) in the query tree. These hit counts are returned as part of
731 the searchResult-1 facility in the binary encoded Z39.50 search
735 By setting a limit for the APT we can make Zebra turn into
736 approximate hit count when a certain hit count limit is
737 reached. A value of zero means exact hit count.
740 For example, we might be intersted in exact hit count for a, but
741 for b we allow hit count estimates for 1000 and higher.
743 Z> find @and a @attr 9=1000 b
747 The estimated hit count fascility makes searches faster, as one
748 only needs to process large hit lists partially.
751 This facility clashes with rank weight, because there all
752 documents in the hit lists need to be examined for scoring and
754 It is an experimental
755 extention. Do not use in production code.
758 <sect3 id="querymodel-zebra-attr-termref">
759 <title>Zebra Extention Term Reference Attribute (type 10)</title>
762 Zebra supports the searchResult-1 facility. If attribute 10 is
763 given, that specifies a subqueryId value returned as part of the
764 search result. It is a way for a client to name an APT part of a
774 Experimental. Do not use in production code.
781 <sect2 id="querymodel-zebra-attr-scan">
782 <title>Zebra specific Scan Extentions to all Attribute Sets</title>
784 Zebra extends the Bib1 attribute types, and these extentions are
785 recognized regardless of attribute
786 set used in a <literal>scan</literal> operation query.
788 <table id="querymodel-zebra-attr-scan-table">
789 <caption>Zebra Scan Attribute Extentions</caption>
792 <td><emphasis>Name and Type</emphasis></td>
794 <td>Zebra version</td>
799 <td><emphasis>Result Set Narrow (type 8)</emphasis></td>
804 <td><emphasis>Approximative Limit (type 9)</emphasis></td>
811 <sect3 id="querymodel-zebra-attr-xyz">
812 <title>Zebra Extention Result Set Narrow (type 8)</title>
815 If attribute 8 is given for scan, the value is the name of a
816 result set. Each hit count in scan is @and'ed with the result set
826 Experimental and buggy. Definitely not to be used in production code.
829 <sect3 id="querymodel-zebra-attr-xyz">
830 <title>Zebra Extention Approximative Limit (type 9)</title>
833 The approximative limit (as for search) is a way to enable approx
834 hit counts for scan hit counts.
843 Experimental. Do not use in production code.
850 <sect2 id="querymodel-bib1-mapping">
851 <title>Mapping from Bib1 Attributes to Zebra internal
852 register indexes</title>
858 <!-- see in util/zebramap.c
861 if (completeness_value == 2 || completeness_value == 3)
867 *sort_flag =(sort_relation_value > 0) ? 1 : 0;
868 *search_type = "phrase";
869 strcpy(rank_type, "void");
870 if (relation_value == 102)
872 if (weight_value == -1)
874 sprintf(rank_type, "rank,w=%d,u=%d", weight_value, use_value);
876 if (relation_value == 103)
878 *search_type = "always";
886 switch (structure_value)
888 case 6: /* word list */
889 *search_type = "and-list";
891 case 105: /* free-form-text */
892 *search_type = "or-list";
894 case 106: /* document-text */
895 *search_type = "or-list";
900 case 108: /* string */
901 *search_type = "phrase";
903 case 107: /* local-number */
904 *search_type = "local";
907 case 109: /* numeric string */
909 *search_type = "numeric";
913 *search_type = "phrase";
917 *search_type = "phrase";
921 *search_type = "phrase";
925 *search_type = "phrase";
936 <emphasis>Use</emphasis> attributes are interpreted according to the
937 attribute sets which have been loaded in the
938 <literal>zebra.cfg</literal> file, and are matched against specific
939 fields as specified in the <literal>.abs</literal> file which
940 describes the profile of the records which have been loaded.
941 If no Use attribute is provided, a default of Bib-1 Any is assumed.
945 If a <emphasis>Structure</emphasis> attribute of
946 <emphasis>Phrase</emphasis> is used in conjunction with a
947 <emphasis>Completeness</emphasis> attribute of
948 <emphasis>Complete (Sub)field</emphasis>, the term is matched
949 against the contents of the phrase (long word) register, if one
950 exists for the given <emphasis>Use</emphasis> attribute.
951 A phrase register is created for those fields in the
952 <literal>.abs</literal> file that contains a
953 <literal>p</literal>-specifier.
954 <!-- ### whatever the hell _that_ is -->
958 If <emphasis>Structure</emphasis>=<emphasis>Phrase</emphasis> is
959 used in conjunction with <emphasis>Incomplete Field</emphasis> - the
960 default value for <emphasis>Completeness</emphasis>, the
961 search is directed against the normal word registers, but if the term
962 contains multiple words, the term will only match if all of the words
963 are found immediately adjacent, and in the given order.
964 The word search is performed on those fields that are indexed as
965 type <literal>w</literal> in the <literal>.abs</literal> file.
969 If the <emphasis>Structure</emphasis> attribute is
970 <emphasis>Word List</emphasis>,
971 <emphasis>Free-form Text</emphasis>, or
972 <emphasis>Document Text</emphasis>, the term is treated as a
973 natural-language, relevance-ranked query.
974 This search type uses the word register, i.e. those fields
975 that are indexed as type <literal>w</literal> in the
976 <literal>.abs</literal> file.
980 If the <emphasis>Structure</emphasis> attribute is
981 <emphasis>Numeric String</emphasis> the term is treated as an integer.
982 The search is performed on those fields that are indexed
983 as type <literal>n</literal> in the <literal>.abs</literal> file.
987 If the <emphasis>Structure</emphasis> attribute is
988 <emphasis>URx</emphasis> the term is treated as a URX (URL) entity.
989 The search is performed on those fields that are indexed as type
990 <literal>u</literal> in the <literal>.abs</literal> file.
994 If the <emphasis>Structure</emphasis> attribute is
995 <emphasis>Local Number</emphasis> the term is treated as
996 native Zebra Record Identifier.
1000 If the <emphasis>Relation</emphasis> attribute is
1001 <emphasis>Equals</emphasis> (default), the term is matched
1002 in a normal fashion (modulo truncation and processing of
1003 individual words, if required).
1004 If <emphasis>Relation</emphasis> is <emphasis>Less Than</emphasis>,
1005 <emphasis>Less Than or Equal</emphasis>,
1006 <emphasis>Greater than</emphasis>, or <emphasis>Greater than or
1007 Equal</emphasis>, the term is assumed to be numerical, and a
1008 standard regular expression is constructed to match the given
1010 If <emphasis>Relation</emphasis> is <emphasis>Relevance</emphasis>,
1011 the standard natural-language query processor is invoked.
1015 For the <emphasis>Truncation</emphasis> attribute,
1016 <emphasis>No Truncation</emphasis> is the default.
1017 <emphasis>Left Truncation</emphasis> is not supported.
1018 <emphasis>Process # in search term</emphasis> is supported, as is
1019 <emphasis>Regxp-1</emphasis>.
1020 <emphasis>Regxp-2</emphasis> enables the fault-tolerant (fuzzy)
1021 search. As a default, a single error (deletion, insertion,
1022 replacement) is accepted when terms are matched against the register
1027 <sect2 id="querymodel-regular">
1028 <title>Zebra Regular Expressions in Truncation Attribute (type = 5)</title>
1031 Each term in a query is interpreted as a regular expression if
1032 the truncation value is either <emphasis>Regxp-1 (@attr 5=102)</emphasis>
1033 or <emphasis>Regxp-2 (@attr 5=103)</emphasis>.
1034 Both query types follow the same syntax with the operands:
1037 <table id="querymodel-regular-operands-table">
1038 <caption>Regular Expression Operands</caption>
1041 <tr><td>one</td><td>two</td></tr>
1046 <td><emphasis>x</emphasis></td>
1047 <td>Matches the character <emphasis>x</emphasis>.</td>
1050 <td><emphasis>.</emphasis></td>
1051 <td>Matches any character.</td>
1054 <td><emphasis>[ .. ]</emphasis></td>
1055 <td>Matches the set of characters specified;
1056 such as <literal>[abc]</literal> or <literal>[a-c]</literal>.</td>
1062 The above operands can be combined with the following operators:
1066 <table id="querymodel-regular-operators-table">
1067 <caption>Regular Expression Operators</caption>
1070 <tr><td>one</td><td>two</td></tr>
1075 <td><emphasis>x*</emphasis></td>
1076 <td>Matches <emphasis>x</emphasis> zero or more times.
1077 Priority: high.</td>
1080 <td><emphasis>x+</emphasis></td>
1081 <td>Matches <emphasis>x</emphasis> one or more times.
1082 Priority: high.</td>
1085 <td><emphasis>x?</emphasis></td>
1086 <td> Matches <emphasis>x</emphasis> zero or once.
1087 Priority: high.</td>
1090 <td><emphasis>xy</emphasis></td>
1091 <td> Matches <emphasis>x</emphasis>, then <emphasis>y</emphasis>.
1092 Priority: medium.</td>
1095 <td><emphasis>x|y</emphasis></td>
1096 <td> Matches either <emphasis>x</emphasis> or <emphasis>y</emphasis>.
1100 <td><emphasis>( )</emphasis></td>
1101 <td>The order of evaluation may be changed by using parentheses.</td>
1107 If the first character of the <emphasis>Regxp-2</emphasis> query
1108 is a plus character (<literal>+</literal>) it marks the
1109 beginning of a section with non-standard specifiers.
1110 The next plus character marks the end of the section.
1111 Currently Zebra only supports one specifier, the error tolerance,
1112 which consists one digit.
1116 Since the plus operator is normally a suffix operator the addition to
1117 the query syntax doesn't violate the syntax for standard regular
1122 For example, a phrase search with regular expressions in
1123 the title-register is performed like this:
1125 Z> find @attr 1=4 @attr 5=102 "informat.* retrieval"
1130 Combinations with other attributes are possible. For example, a
1131 ranked search with a regular expression
1132 (see <xref linkend="administration-ranking"/> for the glory details):
1134 Z> find @attr 1=4 @attr 5=102 @attr 2=102 "informat.* retrieval"
1142 The RecordType parameter in the <literal>zebra.cfg</literal> file, or
1143 the <literal>-t</literal> option to the indexer tells Zebra how to
1144 process input records.
1145 Two basic types of processing are available - raw text and structured
1146 data. Raw text is just that, and it is selected by providing the
1147 argument <emphasis>text</emphasis> to Zebra. Structured records are
1148 all handled internally using the basic mechanisms described in the
1149 subsequent sections.
1150 Zebra can read structured records in many different formats.
1156 <sect1 id="querymodel-cql-to-pqf">
1157 <title>Server Side CQL to PQF Query Translation</title>
1160 <literal><cql2rpn>l2rpn.txt</cql2rpn></literal>
1161 YAZ Frontend Virtual
1162 Hosts option, one can configure
1163 the YAZ Frontend CQL-to-PQF
1164 converter, specifying the interpretation of various
1165 <ulink url="&url.cql;">CQL</ulink>
1166 indexes, relations, etc. in terms of Type-1 query attributes.
1167 <!-- The yaz-client config file -->
1170 For example, using server-side CQL-to-PQF conversion, one might
1171 query a zebra server like this:
1174 yaz-client localhost:9999
1176 Z> find text=(plant and soil)
1179 and - if properly configured - even static relevance ranking can
1180 be performed using CQL query syntax:
1183 Z> find text = /relevant (plant and soil)
1189 By the way, the same configuration can be used to
1190 search using client-side CQL-to-PQF conversion:
1191 (the only difference is <literal>querytype cql2rpn</literal>
1193 <literal>querytype cql</literal>, and the call specifying a local
1197 yaz-client -q local/cql2pqf.txt localhost:9999
1198 Z> querytype cql2rpn
1199 Z> find text=(plant and soil)
1205 Exhaustive information can be found in the
1206 Section "Specification of CQL to RPN mappings" in the YAZ manual.
1207 <ulink url="http://www.indexdata.dk/yaz/doc/tools.tkl#tools.cql.map">
1208 http://www.indexdata.dk/yaz/doc/tools.tkl#tools.cql.map</ulink>,
1209 and shall therefore not be repeated here.
1214 <ulink url="http://www.loc.gov/z3950/agency/zing/cql/dc-indexes.html">
1215 http://www.loc.gov/z3950/agency/zing/cql/dc-indexes.html</ulink>
1216 for the Maintenance Agency's work-in-progress mapping of Dublin Core
1217 indexes to Attribute Architecture (util, XD and BIB-2)
1227 <!-- Keep this comment at the end of the file
1232 sgml-minimize-attributes:nil
1233 sgml-always-quote-attributes:t
1236 sgml-parent-document: "zebra.xml"
1237 sgml-local-catalogs: nil
1238 sgml-namecase-general:t