X-Git-Url: http://sru.miketaylor.org.uk/?a=blobdiff_plain;f=doc%2Fzebra.sgml;h=258af0d51aeb6f4c0459ca0909ab559fafec3dab;hb=543fcf12c500813da5f2da099eb773852fb2bdc0;hp=4d079e3fcfcbf14e591bfe1f8bbaa1c0aecf17de;hpb=82311dccccd49f72dc33e5756ea5a5c662b258e1;p=idzebra-moved-to-github.git

diff --git a/doc/zebra.sgml b/doc/zebra.sgml
index 4d079e3..258af0d 100644
--- a/doc/zebra.sgml
+++ b/doc/zebra.sgml
@@ -1,13 +1,13 @@
 <!doctype linuxdoc system>
 
 <!--
-  $Id: zebra.sgml,v 1.25 1996-05-09 09:59:57 quinn Exp $
+  $Id: zebra.sgml,v 1.31 1996-11-08 14:17:24 adam Exp $
 -->
 
 <article>
 <title>Zebra Server - Administrators's Guide and Reference
 <author><htmlurl url="http://www.indexdata.dk/" name="Index Data">, <tt><htmlurl url="mailto:info@index.ping.dk" name="info@index.ping.dk"></>
-<date>$Revision: 1.25 $
+<date>$Revision: 1.31 $
 <abstract>
 The Zebra information server combines a versatile fielded/free-text
 search engine with a Z39.50-1995 frontend to provide a powerful and flexible
@@ -49,7 +49,7 @@ mailing-list by sending Email to <tt/zebra-request@index.ping.dk/.
 <sect1>Features
 
 <p>
-This is a listof some of the most important features of the
+This is a list of some of the most important features of the
 system.
 
 <itemize>
@@ -159,9 +159,6 @@ data elements in records.
 *Port the system to Windows NT.
 
 <item>
-Add index and data compression to save disk space.
-
-<item>
 Add more sophisticated relevance ranking mechanisms. Add support for soundex
 and stemming. Add relevance <it/feedback/ support.
 
@@ -225,6 +222,9 @@ profilePath: ../../yaz/tab ../tab
 # Files that describe the attribute sets supported.
 attset: bib1.att
 attset: gils.att
+
+# Name of character map file.
+charMap: scan.chr
 </verb></tscreen>
 
 Now, edit the file and set <tt>profilePath</tt> to the path of the
@@ -234,11 +234,11 @@ archive).
 The 48 test records are located in the sub directory <tt>records</tt>.
 To index these, type:
 <tscreen><verb>
-$ ../index/zebraidx -t grs update records
+$ ../index/zebraidx -t grs.sgml update records
 </verb></tscreen>
 
 In the command above the option <tt>-t</tt> specified the record
-type &mdash; in this case <tt>grs</tt>. The word <tt>update</tt> followed
+type &mdash; in this case <tt>grs.sgml</tt>. The word <tt>update</tt> followed
 by a directory root updates all files below that directory node.
 
 If your indexing command was successful, you are now ready to
@@ -361,13 +361,12 @@ by <tt>zebraidx</tt>. If no <tt/-g/ option is specified, the settings
 with no prefix are used.
 
 In the configuration file, the group name is placed before the option
-name
-itself, separated by a dot (.). For instance, to set the record type
-for group <tt/public/ to <tt/grs/ (the common format for structured
+name itself, separated by a dot (.). For instance, to set the record type
+for group <tt/public/ to <tt/grs.sgml/ (the SGML-like format for structured
 records) you would write:
 
 <tscreen><verb>
-public.recordType: grs
+public.recordType: grs.sgml
 </verb></tscreen>
 
 To set the default value of the record type to <tt/text/ write:
@@ -384,8 +383,12 @@ explained further in the following sections.
  Specifies how records with the file extension <it>name</it> should
  be handled by the indexer. This option may also be specified
  as a command line option (<tt>-t</tt>). Note that if you do not
- specify a <it/name/, the setting applies to all files.
-<tag><it>group</it>.recordId</tag>
+ specify a <it/name/, the setting applies to all files. In general,
+ the record type specifier consists of the elements (each
+ element separated by dot), <it>fundamental-type</it>,
+ <it>file-read-type</it> and arguments. Currently, two
+ fundamental types exist, <tt>text</tt> and <tt>grs</tt>.
+ <tag><it>group</it>.recordId</tag>
  Specifies how the records are to be identified when updated. See
 section <ref id="locating-records" name="Locating Records">.
 <tag><it>group</it>.database</tag>
@@ -409,9 +412,12 @@ section <ref id="locating-records" name="Locating Records">.
  Enables the <it/safe update/ facility of Zebra, and tells the system
  where to place the required, temporary files. See section
 <ref id="shadow-registers" name="Safe Updating - Using Shadow Registers">.
-<tag>lockPath</tag>
+<tag>lockDir</tag>
  Directory in which various lock files are stored.
-<tag>tempSetPath</tag>
+<tag>keyTmpDir</tag>
+ Directory in which temporary files used during zebraidx' update
+ phase are stored. 
+<tag>setTmpDir</tag>
  Specifies the directory that the server uses for temporary result sets.
  If not specified <tt>/tmp</tt> will be used.
 <tag>profilePath</tag>
@@ -421,8 +427,13 @@ section <ref id="locating-records" name="Locating Records">.
  searching. At least the Bib-1 set should be loaded (<tt/bib1.att/).
  The <tt/profilePath/ setting is used to look for the specified files.
  See section <ref id="attset-files" name="The Attribute Set Files">
+<tag>charMap</tag>
+ Specifies the filename of a character mapping. Zebra uses the path,
+ <tt>profilePath</tt>, to locate this file.
+<tag>memMax</tag>
+ Specifies size of internal memory to use for the zebraidx program. The
+ amount is given in megabytes - default is 4 (4 MB).
 </descrip>
-
 <sect1>Locating Records<label id="locating-records">
 <p>
 The default behaviour of the Zebra system is to reference the
@@ -865,7 +876,12 @@ privileged port.
 
 <tag>-w <it/working-directory/</tag>Change working directory.
 
-<tag/-i/Run under the Internet superserver, <tt/inetd/.
+<tag>-i <it/minutes/</tag>Run under the Internet superserver, <tt/inetd/.
+
+<tag>-t <it/timeout/</tag>Set the idle session timeout (default 60 minutes).
+
+<tag>-k <it/kilobytes/</tag>Set the (approximate) maximum size of
+present response messages. Default is 1024 Kb (1 Mb).
 </descrip>
 
 A <it/listener-address/ consists of a transport mode followed by a
@@ -924,7 +940,8 @@ listener, for the Z39.50 protocol, on port 9999.
 During initialization, the server will negotiate to version 3 of the
 Z39.50 protocol, and the option bits for Search, Present, Scan,
 NamedResultSets, and concurrentOperations will be set, if requested by
-the client. The maximum PDU size is negotiated down to a maximum of 1Mb.
+the client. The maximum PDU size is negotiated down to a maximum of
+1Mb by default.
 
 <sect2>Search
 
@@ -965,11 +982,37 @@ expression is constructed to match the given expression. If
 processor is invoked.
 
 For the <bf/Truncation/ attribute, <bf/No Truncation/ is the default.
-<bf/Left Truncation/ is not supported. <bf/Process #/ is supported, as
+<bf/Left Truncation/ is not supported. <bf/Process &num;/ is supported, as
 is <bf/Regxp-1/. <bf/Regxp-2/ enables the fault-tolerant (fuzzy)
 search. As a default, a single error (deletion, insertion,
 replacement) is accepted when terms are matched against the register
-contents.
+contents. The <bf/Regxp-1/ and <bf/Regxp-2/ both follow the same syntax
+with the operands:
+<descrip>
+<tag/x/ Matches the character <it/x/.
+<tag/./ Matches any character.
+<tag><tt/[/..<tt/]/</tag> Matches the set of characters specified;
+ such as <tt/[abc]/ or <tt/[a-c]/.
+</descrip>
+and the operators:
+<descrip>
+<tag/x*/ Matches <it/x/ zero or more times. Priority: high.
+<tag/x+/ Matches <it/x/ one or more times. Priority: high.
+<tag/x?/ Matches <it/x/ once or twice. Priority: high.
+<tag/xy/ Matches <it/x/, then <it/y/. Priority: medium.
+<tag/x|y/ Matches either <it/x/ or <it/y/. Priority: low.
+</descrip>
+The order of evaluation may be changed by using parentheses.
+
+If the first character of the <bf/Regxp-2/ query is a plus character
+(<tt/+/) it marks the beginning of a section with non-standard
+specifiers. The next plus character marks the end of the section.
+Currently Zebra only supports one specifier, the error tolerance,
+which consists one digit. 
+
+Since the plus operator is normally a suffix operator the addition to
+the query syntax doesn't violate the syntax for standard regular
+expressions.
 
 <sect2>Present
 
@@ -995,7 +1038,7 @@ If a Close PDU is received, the server will respond with a Close PDU
 with reason=FINISHED, no matter which protocol version was negotiated
 during initialization. If the protocol version is 3 or more, the
 server will generate a Close PDU under certain circumstances,
-including a session timeout (ca. 60 minutes), and certain kinds of
+including a session timeout (60 minutes by default), and certain kinds of
 protocol errors. Once a Close PDU has been sent, the protocol
 association is considered broken, and the transport connection will be
 closed immediately upon receipt of further data, or following a short
@@ -1012,6 +1055,10 @@ record. Any number of record schema can coexist in the system.
 Although it may be wise to use only a single schema within
 one database, the system poses no such restrictions.
 
+The record model described in this chapter applies to the fundamental
+record type <tt>grs</tt> as introduced in
+section <ref id="record-types" name="Record Types">.
+
 Records pass through three different states during processing in the
 system.
 
@@ -1055,6 +1102,9 @@ a single, canonical input format that gives access to the full
 spectrum of structure and flexibility in the system. In Zebra, this
 canonical format is an &dquot;SGML-like&dquot; syntax.
 
+To use the canonical format specify <tt>grs.sgml</tt> as the record
+type,
+
 Consider a record describing an information resource (such a record is
 sometimes known as a <it/locator record/). It might contain a field
 describing the distributor of the information resource, which might in
@@ -1189,7 +1239,10 @@ work with.
 
 Input filters are ASCII files, generally with the suffix <tt/.flt/.
 The system looks for the files in the directories given in the
-<bf/profilePath/ setting in the <tt/zebra.cfg/ file.
+<bf/profilePath/ setting in the <tt/zebra.cfg/ files. The record type
+for the filter is <tt>grs.regx.</tt><it>filter-filename</it>
+(fundamental type <tt>grs</tt>, file read type <tt>regx</tt>, argument
+<it>filter-filename</it>).
 
 Generally, an input filter consists of a sequence of rules, where each
 rule consists of a sequence of expressions, followed by an action. The
@@ -1899,7 +1952,7 @@ belonging to the Explain schema.
 <sect>License
 
 <p>
-Copyright &copy; 1995, Index Data.
+Copyright &copy; 1995,1996 Index Data.
 
 All rights reserved.