-Since the existing records in an index can not be addressed by their
-IDs, it is impossible to delete or modify records when using this method.
-
-<sect1>Indexing with File Record IDs<label id="file-ids">
-
-<p>
-If you have a set of files that regularly change over time: Old files
-are deleted, new ones are added, or existing files are modified, you
-can benefit from using the <it/file ID/ indexing methodology. Examples
-of this type of database might include an index of WWW resources, or a
-USENET news spool area. Briefly speaking, the file key methodology
-uses the directory paths of the individual records as a unique
-identifier for each record. To perform indexing of a directory with
-file keys, again, you specify the top-level directory after the
-<tt>update</tt> command. The command will recursively traverse the
-directories and compare each one with whatever have been indexed before in
-that same directory. If a file is new (not in the previous version of
-the directory) it is inserted into the registers; if a file was
-already indexed and it has been modified since the last update,
-the index is also modified; if a file has been removed since the last
-visit, it is deleted from the index.
-
-The resulting system is easy to administrate. To delete a record you
-simply have to delete the corresponding file (say, with the <tt/rm/
-command). And to add records you create new files (or directories with
-files). For your changes to take effect in the register you must run
-<tt>zebraidx update</tt> with the same directory root again. This mode
-of operation requires more disk space than simpler indexing methods,
-but it makes it easier for you to keep the index in sync with a
-frequently changing set of data. If you combine this system with the
-<it/safe update/ facility (see below), you never have to take your
-server offline for maintenance or register updating purposes.
-
-To enable indexing with pathname IDs, you must specify <tt>file</tt> as
-the value of <tt>recordId</tt> in the configuration file. In addition,
-you should set <tt>storeKeys</tt> to <tt>1</tt>, since the Zebra
-indexer must save additional information about the contents of each record
-in order to modify the indices correctly at a later time.
-
-For example, to update records of group <tt>esdd</tt> located below
-<tt>/data1/records/</tt> you should type:
-<tscreen><verb>
-$ zebraidx -g esdd update /data1/records
-</verb></tscreen>
-
-The corresponding configuration file includes:
-<tscreen><verb>
-esdd.recordId: file
-esdd.recordType: grs.sgml
-esdd.storeKeys: 1
-</verb></tscreen>
-
-<em>Important note: You cannot start out with a group of records with simple
-indexing (no record IDs as in the previous section) and then later
-enable file record Ids. Zebra must know from the first time that you
-index the group that
-the files should be indexed with file record IDs.
-</em>
-
-You cannot explicitly delete records when using this method (using the
-<bf/delete/ command to <tt/zebraidx/. Instead
-you have to delete the files from the file system (or move them to a
-different location)
-and then run <tt>zebraidx</tt> with the <bf/update/ command.
-
-<sect1>Indexing with General Record IDs
-<p>
-When using this method you construct an (almost) arbritrary, internal
-record key based on the contents of the record itself and other system
-information. If you have a group of records that explicitly associates
-an ID with each record, this method is convenient. For example, the
-record format may contain a title or a ID-number - unique within the group.
-In either case you specify the Z39.50 attribute set and use-attribute
-location in which this information is stored, and the system looks at
-that field to determine the identity of the record.
-
-As before, the record ID is defined by the <tt>recordId</tt> setting
-in the configuration file. The value of the record ID specification
-consists of one or more tokens separated by whitespace. The resulting
-ID is
-represented in the index by concatenating the tokens and separating them by
-ASCII value (1).
-
-There are three kinds of tokens:
-<descrip>
-<tag>Internal record info</tag> The token refers to a key that is
-extracted from the record. The syntax of this token is
- <tt/(/ <em/set/ <tt/,/ <em/use/ <tt/)/, where <em/set/ is the
-attribute set name <em/use/ is the name or value of the attribute.
-<tag>System variable</tag> The system variables are preceded by
-<verb>$</verb> and immediately followed by the system variable name, which
-may one of
- <descrip>
- <tag>group</tag> Group name.
- <tag>database</tag> Current database specified.
- <tag>type</tag> Record type.
- </descrip>
-<tag>Constant string</tag> A string used as part of the ID — surrounded
- by single- or double quotes.
-</descrip>
-
-For instance, the sample GILS records that come with the Zebra
-distribution contain a unique ID in the data tagged Control-Identifier.
-The data is mapped to the Bib-1 use attribute Identifier-standard
-(code 1007). To use this field as a record id, specify
-<tt>(bib1,Identifier-standard)</tt> as the value of the
-<tt>recordId</tt> in the configuration file.
-If you have other record types that uses the same field for a
-different purpose, you might add the record type
-(or group or database name) to the record id of the gils
-records as well, to prevent matches with other types of records.
-In this case the recordId might be set like this:
-<tscreen><verb>
-gils.recordId: $type (bib1,Identifier-standard)
-</verb></tscreen>
-
-(see section <ref id="data-model" name="Configuring Your Data Model">
-for details of how the mapping between elements of your records and
-searchable attributes is established).
-
-As for the file record ID case described in the previous section,
-updating your system is simply a matter of running <tt>zebraidx</tt>
-with the <tt>update</tt> command. However, the update with general
-keys is considerably slower than with file record IDs, since all files
-visited must be (re)read to discover their IDs.
-
-As you might expect, when using the general record IDs
-method, you can only add or modify existing records with the <tt>update</tt>
-command. If you wish to delete records, you must use the,
-<tt>delete</tt> command, with a directory as a parameter.
-This will remove all records that match the files below that root
-directory.
-
-<sect1>Register Location<label id="register-location">
-
-<p>
-Normally, the index files that form dictionaries, inverted
-files, record info, etc., are stored in the directory where you run
-<tt>zebraidx</tt>. If you wish to store these, possibly large, files
-somewhere else, you must add the <tt>register</tt> entry to the
-<tt/zebra.cfg/ file. Furthermore, the Zebra system allows its file
-structures to
-span multiple file systems, which is useful for managing very large
-databases.
-
-The value of the <tt>register</tt> setting is a sequence of tokens.
-Each token takes the form:
-<tscreen>
-<em>dir</em><tt>:</tt><em>size</em>.
-</tscreen>
-The <em>dir</em> specifies a directory in which index files will be
-stored and the <em>size</em> specifies the maximum size of all
-files in that directory. The Zebra indexer system fills each directory
-in the order specified and use the next specified directories as needed.
-The <em>size</em> is an integer followed by a qualifier
-code, <tt>M</tt> for megabytes, <tt>k</tt> for kilobytes.
-
-For instance, if you have allocated two disks for your register, and
-the first disk is mounted
-on <tt>/d1</tt> and has 200 Mb of free space and the
-second, mounted on <tt>/d2</tt> has 300 Mb, you could
-put this entry in your configuration file:
-<tscreen><verb>
-register: /d1:200M /d2:300M
-</verb></tscreen>
-
-Note that Zebra does not verify that the amount of space specified is
-actually available on the directory (file system) specified - it is
-your responsibility to ensure that enough space is available, and that
-other applications do not attempt to use the free space. In a large production system,
-it is recommended that you allocate one or more filesystem exclusively
-to the Zebra register files.
-
-<sect1>Safe Updating - Using Shadow Registers<label id="shadow-registers">
-
-<sect2>Description
-
-<p>
-The Zebra server supports <it/updating/ of the index structures. That is,
-you can add, modify, or remove records from databases managed by Zebra
-without rebuilding the entire index. Since this process involves
-modifying structured files with various references between blocks of
-data in the files, the update process is inherently sensitive to
-system crashes, or to process interruptions: Anything but a
-successfully completed update process will leave the register files in
-an unknown state, and you will essentially have no recourse but to
-re-index everything, or to restore the register files from a backup
-medium. Further, while the update process is active, users cannot be
-allowed to access the system, as the contents of the register files
-may change unpredictably.
-
-You can solve these problems by enabling the shadow register system in
-Zebra. During the updating procedure, <tt/zebraidx/ will temporarily
-write changes to the involved files in a set of &dquot;shadow
-files&dquot;, without modifying the files that are accessed by the
-active server processes. If the update procedure is interrupted by a
-system crash or a signal, you simply repeat the procedure - the
-register files have not been changed or damaged, and the partially
-written shadow files are automatically deleted before the new updating
-procedure commences.
-
-At the end of the updating procedure (or in a separate operation, if
-you so desire), the system enters a &dquot;commit mode&dquot;. First,
-any active server processes are forced to access those blocks that
-have been changed from the shadow files rather than from the main
-register files; the unmodified blocks are still accessed at their
-normal location (the shadow files are not a complete copy of the
-register files - they only contain those parts that have actually been
-modified). If the commit process is interrupted at any point during the
-commit process, the server processes will continue to access the
-shadow files until you can repeat the commit procedure and complete
-the writing of data to the main register files. You can perform
-multiple update operations to the registers before you commit the
-changes to the system files, or you can execute the commit operation
-at the end of each update operation. When the commit phase has
-completed successfully, any running server processes are instructed to
-switch their operations to the new, operational register, and the
-temporary shadow files are deleted.
-
-<sect2>How to Use Shadow Register Files
-
-<p>
-The first step is to allocate space on your system for the shadow
-files. You do this by adding a <tt/shadow/ entry to the <tt/zebra.cfg/
-file. The syntax of the <tt/shadow/ entry is exactly the same as for
-the <tt/register/ entry (see section <ref name="Register Location"
-id="register-location">). The location of the shadow area should be
-<it/different/ from the location of the main register area (if you
-have specified one - remember that if you provide no <tt/register/
-setting, the default register area is the
-working directory of the server and indexing processes).
-
-The following excerpt from a <tt/zebra.cfg/ file shows one example of
-a setup that configures both the main register location and the shadow
-file area. Note that two directories or partitions have been set aside
-for the shadow file area. You can specify any number of directories
-for each of the file areas, but remember that there should be no
-overlaps between the directories used for the main registers and the
-shadow files, respectively.
-
-<tscreen><verb>
-register: /d1:500M
-
-shadow: /scratch1:100M /scratch2:200M
-</verb></tscreen>
-
-When shadow files are enabled, an extra command is available at the
-<tt/zebraidx/ command line. In order to make changes to the system
-take effect for the users, you'll have to submit a
-&dquot;commit&dquot; command after a (sequence of) update
-operation(s). You can ask the indexer to commit the changes
-immediately after the update operation:
-
-<tscreen><verb>
-$ zebraidx update /d1/records update /d2/more-records commit
-</verb></tscreen>
-
-Or you can execute multiple updates before committing the changes:
-
-<tscreen><verb>
-$ zebraidx -g books update /d1/records update /d2/more-records
-$ zebraidx -g fun update /d3/fun-records
-$ zebraidx commit
-</verb></tscreen>
-
-If one of the update operations above had been interrupted, the commit
-operation on the last line would fail: <tt/zebraidx/ will not let you
-commit changes that would destroy the running register. You'll have to
-rerun all of the update operations since your last commit operation,
-before you can commit the new changes.
-
-Similarly, if the commit operation fails, <tt/zebraidx/ will not let
-you start a new update operation before you have successfully repeated
-the commit operation. The server processes will keep accessing the
-shadow files rather than the (possibly damaged) blocks of the main
-register files until the commit operation has successfully completed.
-
-You should be aware that update operations may take slightly longer
-when the shadow register system is enabled, since more file access
-operations are involved. Further, while the disk space required for
-the shadow register data is modest for a small update operation, you
-may prefer to disable the system if you are adding a very large number
-of records to an already very large database (we use the terms
-<it/large/ and <it/modest/ very loosely here, since every
-application will have a different perception of size). To update the system
-without the use of the the shadow files, simply run <tt/zebraidx/ with
-the <tt/-n/ option (note that you do not have to execute the
-<bf/commit/ command of <tt/zebraidx/ when you temporarily disable the
-use of the shadow registers in this fashion. Note also that, just as
-when the shadow registers are not enabled, server processes will be
-barred from accessing the main register while the update procedure
-takes place.
-