From: Marc Cromme Date: Fri, 23 Jun 2006 13:45:41 +0000 (+0000) Subject: added examples fo phrase and word search X-Git-Tag: before.bug.529~4 X-Git-Url: http://sru.miketaylor.org.uk/cgi-bin?a=commitdiff_plain;h=1ab2e1e0d6f2aa60baa5195b0a313f689d4c1027;p=idzebra-moved-to-github.git added examples fo phrase and word search --- diff --git a/doc/querymodel.xml b/doc/querymodel.xml index 6d41f89..88c2fd7 100644 --- a/doc/querymodel.xml +++ b/doc/querymodel.xml @@ -1,5 +1,5 @@ - + Query Model @@ -1721,7 +1721,7 @@ This feature is enabled when defining the xpath enable option in the GRS filter - *.abs configuration files. If one wants to use + *.abs configuration files. If one wants to use the special idxpath numeric attribute set, the main Zebra configuraiton file zebra.cfg directive attset: idxpath.att must be enabled. @@ -1865,7 +1865,7 @@ first place. We deal first with the rules for deciding which internal register or string index to use, according to the use attribute or access point specified in the query. Thereafter we - deal with the rules for tetermining the correct structure type of + deal with the rules for determining the correct structure type of the named register. @@ -1883,7 +1883,7 @@ - + @@ -1925,18 +1925,23 @@ string index names are normalizes according to the following rules: all single hyphens '-' are stripped, and all upper case - letters are folded to lower case. + letters are folded to lower case. + - - Numeric use attributes are mapped - to the Zebra internal - string index according to the attribute set defintion in use. - The default attribute set is Bib-1, and may be - omitted in the PQF query. According to normalization and numeric - use attribute mapping, it follows that the following - PQF queries are considered equivalent (assuming the default - configuration has not been altered): - + + Numeric use attributes are mapped + to the Zebra internal + string index according to the attribute set defintion in use. + The default attribute set is Bib-1, and may be + omitted in the PQF query. + + + + According to normalization and numeric + use attribute mapping, it follows that the following + PQF queries are considered equivalent (assuming the default + configuration has not been altered): + Z> find @attr 1=Body-of-text serenade Z> find @attr 1=bodyoftext serenade Z> find @attr 1=BodyOfText serenade @@ -1957,7 +1962,8 @@ zebra.cfg file, and are matched against specific fields as specified in the .abs file which describes the profile of the records which have been loaded. - If no use attribute is provided, a default of Bib-1 Any is + If no use attribute is provided, a default of + Bib-1 Use Any (1016) is assumed. The predefined use attribute sets can be reconfigured by tweaking the configuration files @@ -2001,88 +2007,99 @@ - Mapping of PQF APT structure and type + Mapping of PQF APT structure and completeness to + register type - - - + Internally Zebra has in it's default configuration several + different types of registers or indexes, whose tokenization and + character normalization rules differ. This reflects the fact that + serching fundamental different tokens like dates, numbers, + bitfields and string based text needs different rulesets. + - - +
Acces point nameAcces point name mapping
Acess Point
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Structure and completeness mapping to register types
StructureCompletenessRegister typeNotes
+ phrase (@attr 4=1), word (@attr 4=2), + word-list (@attr 4=6), + free-form-text (@attr 4=105), or document-text (@attr 4=106) + Incomplete field (@attr 6=1)Word ('w')Traditional tokenized and character normalized word index
+ phrase (@attr 4=1), word (@attr 4=2), + word-list (@attr 4=6), + free-form-text (@attr 4=105), or document-text (@attr 4=106) + complete field' (@attr 6=3)Phrase ('p')Character normalized, but not tokenized index for phrase + matches +
urx (@attr 4=104)ignoredURX/URL ('u')Special index for URL web adresses
numeric (@attr 4=109)ignoredNumeric ('u')Special index for digital numbers
key (@attr 4=3)ignoredNull bitmap ('0')Used for non-tokenizated and non-normalized bit sequences
year (@attr 4=4)ignoredYear ('y')Non-tokenizated and non-normalized 4 digit numbers
date (@attr 4=5)ignoredDate ('d')Non-tokenizated and non-normalized ISO date strings
ignoredignoredSort ('s')Used with special sort attribute set (@attr 7=1, @attr 7=2)
overruledoverruledspecialInternal record ID register, used whenever + Relation Always Matches (@attr 2=103) is specified
+ + + If a Structure attribute of Phrase is used in conjunction with a @@ -2091,9 +2108,23 @@ against the contents of the phrase (long word) register, if one exists for the given Use attribute. A phrase register is created for those fields in the - .abs file that contains a + GRS *.abs file that contains a p-specifier. - + + Z> scan @attr 1=Title @attr 4=1 @attr 6=3 beethoven + ... + bayreuther festspiele (1) + * beethoven bibliography database (1) + benny carter (1) + ... + Z> find @attr 1=Title @attr 4=1 @attr 6=3 "beethoven bibliography" + ... + Number of hits: 0, setno 5 + ... + Z> find @attr 1=Title @attr 4=1 @attr 6=3 "beethoven bibliography database" + ... + Number of hits: 1, setno 6 + @@ -2104,7 +2135,23 @@ contains multiple words, the term will only match if all of the words are found immediately adjacent, and in the given order. The word search is performed on those fields that are indexed as - type w in the .abs file. + type w in the GRS *.abs file. + + Z> scan @attr 1=Title @attr 4=1 @attr 6=1 beethoven + ... + beefheart (1) + * beethoven (18) + beethovens (7) + ... + Z> find @attr 1=Title @attr 4=1 @attr 6=1 beethoven + ... + Number of hits: 18, setno 1 + ... + Z> find @attr 1=Title @attr 4=1 @attr 6=1 "beethoven bibliography" + ... + Number of hits: 2, setno 2 + ... + @@ -2115,21 +2162,22 @@ natural-language, relevance-ranked query. This search type uses the word register, i.e. those fields that are indexed as type w in the - .abs file. + GRS *.abs file. If the Structure attribute is Numeric String the term is treated as an integer. The search is performed on those fields that are indexed - as type n in the .abs file. + as type n in the GRS + *.abs file. If the Structure attribute is URx the term is treated as a URX (URL) entity. The search is performed on those fields that are indexed as type - u in the .abs file. + u in the *.abs file.