X-Git-Url: http://sru.miketaylor.org.uk/?a=blobdiff_plain;f=doc%2Fzebra.sgml;h=258af0d51aeb6f4c0459ca0909ab559fafec3dab;hb=543fcf12c500813da5f2da099eb773852fb2bdc0;hp=8384f1b680b537d264fdaaf7dd797086a03935b3;hpb=904e4c7a5d970ba391aa80ddd9584f23e921a72f;p=idzebra-moved-to-github.git diff --git a/doc/zebra.sgml b/doc/zebra.sgml index 8384f1b..258af0d 100644 --- a/doc/zebra.sgml +++ b/doc/zebra.sgml @@ -1,13 +1,13 @@
Zebra Server - Administrators's Guide and Reference <author><htmlurl url="http://www.indexdata.dk/" name="Index Data">, <tt><htmlurl url="mailto:info@index.ping.dk" name="info@index.ping.dk"></> -<date>$Revision: 1.26 $ +<date>$Revision: 1.31 $ <abstract> The Zebra information server combines a versatile fielded/free-text search engine with a Z39.50-1995 frontend to provide a powerful and flexible @@ -49,7 +49,7 @@ mailing-list by sending Email to <tt/zebra-request@index.ping.dk/. <sect1>Features <p> -This is a listof some of the most important features of the +This is a list of some of the most important features of the system. <itemize> @@ -159,9 +159,6 @@ data elements in records. *Port the system to Windows NT. <item> -Add index and data compression to save disk space. - -<item> Add more sophisticated relevance ranking mechanisms. Add support for soundex and stemming. Add relevance <it/feedback/ support. @@ -225,6 +222,9 @@ profilePath: ../../yaz/tab ../tab # Files that describe the attribute sets supported. attset: bib1.att attset: gils.att + +# Name of character map file. +charMap: scan.chr </verb></tscreen> Now, edit the file and set <tt>profilePath</tt> to the path of the @@ -234,11 +234,11 @@ archive). The 48 test records are located in the sub directory <tt>records</tt>. To index these, type: <tscreen><verb> -$ ../index/zebraidx -t grs update records +$ ../index/zebraidx -t grs.sgml update records </verb></tscreen> In the command above the option <tt>-t</tt> specified the record -type — in this case <tt>grs</tt>. The word <tt>update</tt> followed +type — in this case <tt>grs.sgml</tt>. The word <tt>update</tt> followed by a directory root updates all files below that directory node. If your indexing command was successful, you are now ready to @@ -361,13 +361,12 @@ by <tt>zebraidx</tt>. If no <tt/-g/ option is specified, the settings with no prefix are used. In the configuration file, the group name is placed before the option -name -itself, separated by a dot (.). For instance, to set the record type -for group <tt/public/ to <tt/grs/ (the common format for structured +name itself, separated by a dot (.). For instance, to set the record type +for group <tt/public/ to <tt/grs.sgml/ (the SGML-like format for structured records) you would write: <tscreen><verb> -public.recordType: grs +public.recordType: grs.sgml </verb></tscreen> To set the default value of the record type to <tt/text/ write: @@ -384,8 +383,12 @@ explained further in the following sections. Specifies how records with the file extension <it>name</it> should be handled by the indexer. This option may also be specified as a command line option (<tt>-t</tt>). Note that if you do not - specify a <it/name/, the setting applies to all files. -<tag><it>group</it>.recordId</tag> + specify a <it/name/, the setting applies to all files. In general, + the record type specifier consists of the elements (each + element separated by dot), <it>fundamental-type</it>, + <it>file-read-type</it> and arguments. Currently, two + fundamental types exist, <tt>text</tt> and <tt>grs</tt>. + <tag><it>group</it>.recordId</tag> Specifies how the records are to be identified when updated. See section <ref id="locating-records" name="Locating Records">. <tag><it>group</it>.database</tag> @@ -409,9 +412,12 @@ section <ref id="locating-records" name="Locating Records">. Enables the <it/safe update/ facility of Zebra, and tells the system where to place the required, temporary files. See section <ref id="shadow-registers" name="Safe Updating - Using Shadow Registers">. -<tag>lockPath</tag> +<tag>lockDir</tag> Directory in which various lock files are stored. -<tag>tempSetPath</tag> +<tag>keyTmpDir</tag> + Directory in which temporary files used during zebraidx' update + phase are stored. +<tag>setTmpDir</tag> Specifies the directory that the server uses for temporary result sets. If not specified <tt>/tmp</tt> will be used. <tag>profilePath</tag> @@ -421,8 +427,13 @@ section <ref id="locating-records" name="Locating Records">. searching. At least the Bib-1 set should be loaded (<tt/bib1.att/). The <tt/profilePath/ setting is used to look for the specified files. See section <ref id="attset-files" name="The Attribute Set Files"> +<tag>charMap</tag> + Specifies the filename of a character mapping. Zebra uses the path, + <tt>profilePath</tt>, to locate this file. +<tag>memMax</tag> + Specifies size of internal memory to use for the zebraidx program. The + amount is given in megabytes - default is 4 (4 MB). </descrip> - <sect1>Locating Records<label id="locating-records"> <p> The default behaviour of the Zebra system is to reference the @@ -971,11 +982,37 @@ expression is constructed to match the given expression. If processor is invoked. For the <bf/Truncation/ attribute, <bf/No Truncation/ is the default. -<bf/Left Truncation/ is not supported. <bf/Process #/ is supported, as +<bf/Left Truncation/ is not supported. <bf/Process #/ is supported, as is <bf/Regxp-1/. <bf/Regxp-2/ enables the fault-tolerant (fuzzy) search. As a default, a single error (deletion, insertion, replacement) is accepted when terms are matched against the register -contents. +contents. The <bf/Regxp-1/ and <bf/Regxp-2/ both follow the same syntax +with the operands: +<descrip> +<tag/x/ Matches the character <it/x/. +<tag/./ Matches any character. +<tag><tt/[/..<tt/]/</tag> Matches the set of characters specified; + such as <tt/[abc]/ or <tt/[a-c]/. +</descrip> +and the operators: +<descrip> +<tag/x*/ Matches <it/x/ zero or more times. Priority: high. +<tag/x+/ Matches <it/x/ one or more times. Priority: high. +<tag/x?/ Matches <it/x/ once or twice. Priority: high. +<tag/xy/ Matches <it/x/, then <it/y/. Priority: medium. +<tag/x|y/ Matches either <it/x/ or <it/y/. Priority: low. +</descrip> +The order of evaluation may be changed by using parentheses. + +If the first character of the <bf/Regxp-2/ query is a plus character +(<tt/+/) it marks the beginning of a section with non-standard +specifiers. The next plus character marks the end of the section. +Currently Zebra only supports one specifier, the error tolerance, +which consists one digit. + +Since the plus operator is normally a suffix operator the addition to +the query syntax doesn't violate the syntax for standard regular +expressions. <sect2>Present @@ -1018,6 +1055,10 @@ record. Any number of record schema can coexist in the system. Although it may be wise to use only a single schema within one database, the system poses no such restrictions. +The record model described in this chapter applies to the fundamental +record type <tt>grs</tt> as introduced in +section <ref id="record-types" name="Record Types">. + Records pass through three different states during processing in the system. @@ -1061,6 +1102,9 @@ a single, canonical input format that gives access to the full spectrum of structure and flexibility in the system. In Zebra, this canonical format is an &dquot;SGML-like&dquot; syntax. +To use the canonical format specify <tt>grs.sgml</tt> as the record +type, + Consider a record describing an information resource (such a record is sometimes known as a <it/locator record/). It might contain a field describing the distributor of the information resource, which might in @@ -1195,7 +1239,10 @@ work with. Input filters are ASCII files, generally with the suffix <tt/.flt/. The system looks for the files in the directories given in the -<bf/profilePath/ setting in the <tt/zebra.cfg/ file. +<bf/profilePath/ setting in the <tt/zebra.cfg/ files. The record type +for the filter is <tt>grs.regx.</tt><it>filter-filename</it> +(fundamental type <tt>grs</tt>, file read type <tt>regx</tt>, argument +<it>filter-filename</it>). Generally, an input filter consists of a sequence of rules, where each rule consists of a sequence of expressions, followed by an action. The @@ -1905,7 +1952,7 @@ belonging to the Explain schema. <sect>License <p> -Copyright © 1995, Index Data. +Copyright © 1995,1996 Index Data. All rights reserved.