From: Marc Cromme Date: Fri, 23 Jun 2006 11:12:07 +0000 (+0000) Subject: added section on mapping of string/xpath/internal indexes, added discussion of comple... X-Git-Tag: before.bug.529~8 X-Git-Url: http://sru.miketaylor.org.uk/cgi-bin?a=commitdiff_plain;h=ba642ea21a324bc06ce2b892b9940a5d61e7e02d;p=idzebra-moved-to-github.git added section on mapping of string/xpath/internal indexes, added discussion of completeness attributes --- diff --git a/doc/querymodel.xml b/doc/querymodel.xml index d359d18..2ac38ca 100644 --- a/doc/querymodel.xml +++ b/doc/querymodel.xml @@ -1,5 +1,5 @@ - + Query Model @@ -149,7 +149,7 @@ - Prefix Query Format structure and syntax + Prefix Query Format syntax and semantics The PQF grammer is documented in the YAZ manual, and shall not be @@ -236,11 +236,9 @@ - The use attributes (type 1) of the predefined attribute sets can - be reconfigured by tweaking the files - tab/*.att. - New attribute sets can be defined by adding similar files in the - configuration path of the server. + The use attributes (type 1) mappings the + predefined attribute sets are found in the + attribute set configuration files tab/*.att. @@ -387,21 +385,21 @@ default index using the default attribite set, the server choice of access point/index, and the default non-use attributes. - Z> find "information" + Z> find information Equivalent query fully specified including all default values: - Z> find @attrset bib-1 @attr 1=1017 @attr 2=3 @attr 3=3 @attr 4=1 @attr 5=100 @attr 6=1 "information" + Z> find @attrset bib-1 @attr 1=1017 @attr 2=3 @attr 3=3 @attr 4=1 @attr 5=100 @attr 6=1 information - Finding all documents which have empty titles. Notice that the - empty term must be quoted, but is otherwise legal. + Finding all documents which have the term + debussy in the title field. - Z> find @attr 1=4 "" + Z> find @attr 1=4 debussy @@ -453,7 +451,7 @@ - Zebra's special use attribute type 1 of form 'string' + Zebra's special access point of type 'string' The numeric use (type 1) attribute is usually refered to from a given @@ -494,7 +492,7 @@ - See also for details, and + See also for details, and for the SRU PQF query extention using string names as a fast debugging facility. @@ -502,7 +500,7 @@ - Zebra's special use attribute type 1 of form 'XPath' + <title>Zebra's special access point of type 'XPath' for GRS filters As we have seen above, it is possible (albeit seldom a great @@ -612,8 +610,8 @@ Explain Attribute Set The Z39.50 standard defines the - Explainattribute set - exp-1, which is used to discover information + Explain attribute set + Exp-1, which is used to discover information about a server's search semantics and functional capabilities Zebra exposes a "classic" Explain database by base name IR-Explain-1, which @@ -621,7 +619,7 @@ The attribute-set exp-1 consists of a single - Use (type 1) attribute. + use attribute (type 1). In addition, the non-Use @@ -867,33 +865,49 @@ AlwaysMatches 103 - unsupported + supported The relation attribute - relevance (102) is supported, see + Relevance (102) is supported, see for full information. - - - All ordering operations are based on a lexicographical ordering, - expect when the - structure attribute numeric (109) is used. In - this case, ordering is numerical. See - . - + + Ranked search for information retrieval in + the title-register: + + Z> find @attr 1=4 @attr 2=102 "information retrieval" + + - Ranked search for information retrieval in - the title-register: - - Z> find @attr 1=4 @attr 2=102 "information retrieval" - - + The relation attribute + AlwaysMatches (103) is in the default + configuration + supported in conjecture with structure attribute + Phrase (1) (which may be omitted by + default). + It can be configured to work with other structure attributes, + see the configuration file + tab/default.idx and + . + + + AlwaysMatches (103) is a + great way to discover how many documents have been indexed in a + given field. The search term is ignored, but needed for correct + PQF syntax. An empty search term may be supplied. + + Z> find @attr 1=Title @attr 2=103 "" + Z> find @attr 1=Title @attr 2=103 @attr 4=1 "" + + + + @@ -1118,7 +1132,7 @@ The exact mapping between PQF queries and Zebra internal indexes and index types is explained in - . + . @@ -1318,7 +1332,7 @@ The exact mapping between PQF queries and Zebra internal indexes and index types is explained in - . + . @@ -1340,6 +1354,39 @@ idxpath attribute set. + + Zebra specific retrieval of all records + + Zebra defines a hardwired string index name + called _ALLRECORDS. It matches any record + contained in the database, if used in conjunction with + the relation attribute + AlwaysMatches (103). + + + The _ALLRECORDS index name is used for total database + export. The search term is ignored, it may be empty. + + Z> find @attr 1=_ALLRECORDS @attr 2=103 "" + + + + Combination with other index types can be made. For example, to + find all records which are not indexed in + the Title register, issue one of the two + equivalent queries: + + Z> find @not @attr 1=_ALLRECORDS @attr 2=103 "" @attr 1=Title @attr 2=103 "" + Z> find @not @attr 1=_ALLRECORDS @attr 2=103 "" @attr 1=4 @attr 2=103 "" + + + + The special string index _ALLRECORDS is + experimental, and the provided functionality and syntax may very + well change in future releases of Zebra. + + + Zebra specific Search Extentions to all Attribute Sets @@ -1404,6 +1451,15 @@ faster and does not require clients to deal with the Sort Facility. + + + All ordering operations are based on a lexicographical ordering, + expect when the + structure attribute numeric (109) is used. In + this case, ordering is numerical. See + . + + The possible values after attribute type 7 are 1 ascending and @@ -1769,14 +1825,154 @@ - - Mapping from Bib1 Attributes to Zebra internal + <sect2 id="querymodel-pqf-apt-mapping"> + <title>Mapping from PQF atomic APT queries to Zebra internal register indexes - TO-DO + The rules for PQF APT mapping are rather tricky to grasp in the + first place. We deal first with the rules for deciding which + internal register or string index to use, according to the use + attribute or access point specified in the query. Thereafter we + deal with the rules for tetermining the correct structure type of + the named register. + + + + Mapping of PQF APT access points + + Zebra understands four fundamental different types of access + points, of which only the + numeric use attribute type access points + are defined by the Z39.50 + standard. + All other access point types are Zebra specific, and non-portable. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Acces point name
Acess PointTypeGrammarNotes
Use attibutenumeric[1-9][1-9]*directly mapped to string index name
String index namestring[a-zA-Z](\-?[a-zA-Z0-9])*normalized name is used as internal string index name
Zebra internal index namezebra_[a-zA-Z](_?[a-zA-Z0-9])*hardwired internal string index name
XPATH special indexXPath/.*special xpath search for GRS indexed records
+ + + Attribute set names and + string index names are normalizes + according to the following rules: all single + hyphens '-' are stripped, and all upper case + letters are folded to lower case. + + + Numeric use attributes are mapped + to the Zebra internal + string index according to the attribute set defintion in use. + The default attribute set is Bib-1, and may be + omitted in the PQF query. According to normalization and numeric + use attribute mapping, it follows that the following + PQF queries are considered equivalent (assuming the default + configuration has not been altered): + + Z> find @attr 1=Body-of-text serenade + Z> find @attr 1=bodyoftext serenade + Z> find @attr 1=BodyOfText serenade + Z> find @attr 1=bO-d-Y-of-tE-x-t serenade + Z> find @attr 1=1010 serenade + Z> find @attrset Bib-1 @attr 1=1010 serenade + Z> find @attrset bib1 @attr 1=1010 serenade + Z> find @attrset Bib1 @attr 1=1010 serenade + Z> find @attrset b-I-b-1 @attr 1=1010 serenade + + + + + The numerical + use attributes (type 1) + are interpreted according to the + attribute sets which have been loaded in the + zebra.cfg file, and are matched against specific + fields as specified in the .abs file which + describes the profile of the records which have been loaded. + If no use attribute is provided, a default of Bib-1 Any is + assumed. + The predefined use attribute sets + can be reconfigured by tweaking the configuration files + tab/*.att, and + new attribute sets can be defined by adding similar files in the + configuration path profilePath of the server. + + + + String indexes can be acessed directly, + independently which attribute set is in use. These are just + ignored. The above mentioned name normalization applies. + String index names are defined in the + used indexing filter configuration files, for example in the + GRS + *.abs configuration files, or in the + alvis filter XSLT indexing stylesheets. + + + + Zebra internal indexes can be acessed directly, + according to the same rules as the user defined + string indexes. The only difference is that + Zebra internal indexe names are hardwired, + all uppercase and + must start with the character '_'. + + Finally, XPATH access points are only + available using the GRS filter for indexing. + These acees point names must start with the character + '/', they are not + normalized, but passed unaltered to the Zebra internal + XPATH engine. See . + + + +
+ + + + Mapping of PQF APT structure and type + + + - - Use attributes are interpreted according to the - attribute sets which have been loaded in the - zebra.cfg file, and are matched against specific - fields as specified in the .abs file which - describes the profile of the records which have been loaded. - If no Use attribute is provided, a default of Bib-1 Any is assumed. - If a Structure attribute of @@ -1944,6 +2132,8 @@ replacement) is accepted when terms are matched against the register contents. + +