Towards GPL

[idzebra-moved-to-github.git] / doc / zebra.sgml
diff --git a/doc/zebra.sgml b/doc/zebra.sgml

index cd98dd5..0317e2c 100644 (file)
--- a/doc/zebra.sgml
+++ b/doc/zebra.sgml
@@ -1,20 +1,25 @@
  <!doctype linuxdoc system>
  
  <!--
  <!doctype linuxdoc system>
  
  <!--
-  $Id: zebra.sgml,v 1.18 1996-02-10 12:20:46 quinn Exp $
+  $Id: zebra.sgml,v 1.47 2000-02-25 11:35:41 adam Exp $
  -->
  
  <article>
  <title>Zebra Server - Administrators's Guide and Reference
  -->
  
  <article>
  <title>Zebra Server - Administrators's Guide and Reference
-<author><htmlurl url="http://www.index.dk/" name="Index Data">, <tt><htmlurl url="mailto:info@index.ping.dk" name="info@index.ping.dk"></>
-<date>$Revision: 1.18 $
+<author><htmlurl url="http://www.indexdata.dk/" name="Index Data">,
+<tt><htmlurl url="mailto:info@indexdata.dk" name="info@indexdata.dk"></>
+<date>$Revision: 1.47 $
  <abstract>
  <abstract>
-The Zebra information server combines a versatile fielded/free-text
-search engine with a Z39.50-1995 frontend to provide a powerful and flexible
-information management system. This document explains the procedure for
-installing and configuring the system, and outlines the possibilities
+
+
+The Zebra server combines a versatile fielded/free-text
+indexing/search engine with a Z39.50-1995 frontend to provide a powerful and flexible
+information mining tool. This document explains the procedure for
+installing and configuring Zebra, and outlines the possibilities
  for managing data and providing Z39.50
  for managing data and providing Z39.50
-services with the software.
+services with the software. Zebra is a free version of the Index Data Z'mbol
+information system, and it excludes some functionality such as incremental
+database updating and support for large databases.
  </abstract>
  
  <toc>
  </abstract>
  
  <toc>
@@ -24,53 +29,51 @@ services with the software.
  <sect1>Overview
  
  <p>
  <sect1>Overview
  
  <p>
-The Zebra system is a fielded free-text indexing and retrieval engine with a
+Zebra is a fielded free-text indexing and retrieval engine with a
  Z39.50 frontend. You can use any commercial or freeware Z39.50 client
  to access data stored in Zebra.
  
  Z39.50 frontend. You can use any commercial or freeware Z39.50 client
  to access data stored in Zebra.
  
-The Zebra server is our first step towards the development of a fully
-configurable, open information system. Eventually, it will be paired
-off with a powerful Z39.50 client to support complex information
-management tasks within almost any application domain. We're making
-the server available now because it's no fun to be in the open
-information retrieval business all by yourself. We want to allow
-people with interesting data to make their things
-available in interesting ways, without having to start out
-by implementing yet another protocol stack from scratch.
-
-This document is an introduction to the Zebra system. It will tell you
+Zebra server can be used at the core of a Z39.50-based information retrieval
+framework. We're making
+the server available now to allow researchers and small organisations to
+share their information in the best possible way. We believe that Z39.50
+currently represents one of the best ways of sharing information with others, and
+we would like to encourage as many people as possible to do so.
+This document is a guide to using Zebra. It will tell you
  how to compile the software, and how to prepare your first database.
  It also explains how the server can be configured to give you the
  functionality that you need.
  
  If you find the software interesting, you should join the support
  how to compile the software, and how to prepare your first database.
  It also explains how the server can be configured to give you the
  functionality that you need.
  
  If you find the software interesting, you should join the support
-mailing-list by sending Email to <tt/zebra-request@index.ping.dk/.
+mailing-list by sending email to <tt/zebra-request@indexdata.dk/.
+
+If you are interested in running a commercial service, if you wish to run large
+databases, or if you wish to make incremental updates to your databases even
+while users are accessing your system, then you might be interested in the Z'mbol
+Information Server which is available from <htmlurl
+url="http://www.indexdata.dk/zmbol/" name="Index Data"> or Fretwell-Downing
+Informatics. Z'mbol is a complete and supported package which offers many
+exciting possibilities that we have not been able to fit into this package.
  
  <sect1>Features
  
  <p>
  
  <sect1>Features
  
  <p>
-This is a listof some of the most important features of the
+This is a list of some of the most important features of the
  system.
  
  <itemize>
  
  <item>
  system.
  
  <itemize>
  
  <item>
-Supports updating - records can be added and deleted without
-rebuilding the index from scratch.
-The update procedure is tolerant to crashes or hard interrupts
-during register updating - registers can be reconstructed following a crash.
-Registers can be safely updated even while users are accessing the server.
-
-<item>
-Supports large databases - files for indices, etc. can be
-automatically partitioned over multiple disks.
-
-<item>
  Supports arbitrarily complex records - base input format is an
  Supports arbitrarily complex records - base input format is an
-SGML-like syntax which allows nested (structured) data elements, as
+XML-like syntax which allows nested (structured) data elements, as
  well as variant forms of data.
  
  <item>
  well as variant forms of data.
  
  <item>
+Supports random storage formats. A system of input filters driven by
+regular expressions allows you to easily process most ASCII-based
+data formats. SGML/XML, ISO2709 (MARC), and raw text are also supported.
+
+<item>
  Supports boolean queries as well as relevance-ranking (free-text)
  searching. Right truncation and masking in terms are supported, as
  well as full regular expressions.
  Supports boolean queries as well as relevance-ranking (free-text)
  searching. Right truncation and masking in terms are supported, as
  well as full regular expressions.
@@ -78,16 +81,25 @@ well as full regular expressions.
  <item>
  Supports multiple concrete syntaxes
  for record exchange (depending on the configuration): GRS-1, SUTRS,
  <item>
  Supports multiple concrete syntaxes
  for record exchange (depending on the configuration): GRS-1, SUTRS,
-ISO2709 (*MARC). Records can be mapped between record syntaxes and
+ISO2709 (*MARC), XML. Records can be mapped between record syntaxes and
  schema on the fly.
  
  <item>
  schema on the fly.
  
  <item>
+Supports approximate matching in registers (ie. spelling mistakes,
+etc).
+
+<item> Supports a subset of the Z39.50 Explain Facility. Zebra's Explain database
+is automatically updated when a set of records is loaded into Zebra.
+
+</itemize>
+
+<p>
  Protocol support:
  
  <itemize>
  
  <item>
  Protocol support:
  
  <itemize>
  
  <item>
-Protocol facilities: Init, Search, Retrieve, Browse.
+Protocol facilities: Init, Search, Retrieve, Browse, Sort, Close, and Explain.
  
  <item>
  Piggy-backed presents are honored in the search-request.
  
  <item>
  Piggy-backed presents are honored in the search-request.
@@ -111,24 +123,15 @@ system, and are given in configuration files as simple element
  requests (and possibly variant requests).
  
  <item>
  requests (and possibly variant requests).
  
  <item>
-Some variant support (not fully implemented yet).
-
-<item>
-Using the YAZ toolkit for the protocol implementation, the
-server can utilise a plug-in XTI/mOSI implementation (not included) to
-provide SR services over an OSI stack, as well as Z39.50 over TCP/IP.
-
-</itemize>
+Zebra runs on most Unix-like systems as well as Windows NT - a binary
+distribution for Windows NT is forthcoming - so far, the installation
+requires Microsoft Visual C++ to compile the system (we use version 6.0).
  
  </itemize>
  
  <sect1>Future Work
  
  <p>
  
  </itemize>
  
  <sect1>Future Work
  
  <p>
-This is an alfa-release of the software, to allow you to look at
-it - try it out, and assess whether it can be of use to you. We expect
-this version to be followed by a succession of beta-releases until we
-arrive at a stable first version.
  
  These are some of the plans that we have for the software in the near
  and far future, approximately ordered after their relative importance.
  
  These are some of the plans that we have for the software in the near
  and far future, approximately ordered after their relative importance.
@@ -139,41 +142,18 @@ last beta release.
  <itemize>
  
  <item>
  <itemize>
  
  <item>
-*Allow the system to handle other input formats. Specifically
-MARC records and general, structured ASCII records (such as mail/news
-files) parameterized by regular expressions.
-
-<item>
-*Complete the support for variants. Finalize support for the WAIS
-retrieval methodology.
+*Complete the support for variants.
  
  <item>
  *Finalize the data element <it/include/ facility to support multimedia
  data elements in records.
  
  <item>
  
  <item>
  *Finalize the data element <it/include/ facility to support multimedia
  data elements in records.
  
  <item>
-*Port the system to Windows NT.
-
-<item>
-Add index and data compression to save disk space.
-
-<item>
  Add more sophisticated relevance ranking mechanisms. Add support for soundex
  Add more sophisticated relevance ranking mechanisms. Add support for soundex
-and stemming. Add relevance feedback support.
+and stemming. Add relevance <it/feedback/ support.
  
  <item>
  
  <item>
-Add Explain support.
-
-<item>
-Add support for very large records by implementing segmentation and/or
-variant pieces.
-
-<item>
-Support the Item Update extended service of the protocol.
-
-<item>
-The Zebra search engine supports approximate string matching in the
-index. We'd like to find a way to support and control this from RPN.
+Complete EXPLAIN support.
  
  <item>
  We want to add a management system that allows you to
  
  <item>
  We want to add a management system that allows you to
@@ -189,85 +169,169 @@ neat, you're welcome to drop us a line saying that, too. You'll find
  contact info at the end of this file.
  
  <sect>Compiling the software
  contact info at the end of this file.
  
  <sect>Compiling the software
-
  <p>
  <p>
-Zebra uses the YAZ package to implement Z39.50, so you
-have to compile YAZ before going further. Specifically, Zebra uses
-the YAZ header files in <tt>yaz/include/..</tt> and its public library
-<tt>yaz/lib/libyaz.a</tt>.
-
-As with YAZ, an ANSI C compiler is required in order to compile the Zebra
-server system &mdash; <tt/gcc/ works fine if your own system doesn't
+You need the 
+<bf><htmlurl url="http://www.indexdata.dk/yaz/" name="YAZ"></>
+package in order to compile this software. We suggest you
+unpack <bf/YAZ/ in the same directory as Zebra. Running
+./configure (UNIX Only) and running make (nmake on WIN32) is
+in usully what it takes to compile YAZ.
+
+<sect1>UNIX
+<p>
+An ANSI C compiler is required to compile the Zebra
+server system &mdash; <tt/gcc/ works very well if your own system doesn't
  provide an adequate compiler.
  
  provide an adequate compiler.
  
-Unpack the Zebra software. You might put Zebra in the same directory level
-as YAZ, for example if YAZ is placed in ..<tt>/src/yaz-xxx</tt>, then
-Zebra is placed in ..<tt>/src/zebra-yyy</tt>.
+Unpack the distribution archive. The <tt>configure</tt> shell script
+attempts to guess correct values for various system-dependent variables
+used during compilation. It uses those values to create a 'Makefile' in
+each directory of Zebra.
  
  
-Edit the top-level <tt>Makefile</tt> in the Zebra directory in which
-you specify the location of YAZ by setting make variables.
-The <tt>OSILIB</tt> should be empty if YAZ wasn't compiled with
-MOSI support. Some systems, such as Solaris, have separate socket
-libraries and for those systems you need to specify the
-<tt>NETLIB</tt> variable.
+To run the configure script type:
+<tscreen><verb>
+  ./configure
+</verb></tscreen>
  
  
-When you are done editing the <tt>Makefile</tt> type:
+The configure script attempts to use the C compiler specified by
+the <tt>CC</tt> environment variable. If not set, GNU C
+will be used if it is available. The <tt>CFLAGS</tt> environment variable
+holds options to be passed to the C compiler. If you're using a
+Bourne-compatible shell you may pass something like this:
  <tscreen><verb>
  <tscreen><verb>
-$ make
+  CC=/opt/ccs/bin/cc CFLAGS=-O ./configure
  </verb></tscreen>
  
  </verb></tscreen>
  
+To customize Zebra the configure script accepts a set of options. The
+most important are
+<descrip>
+<tag><tt>-</tt><tt>-prefix </tt>path</tag> Specifies installation prefix. This is
+only needed if you run <tt>make install</tt> later to perform a
+"system" installation. The prefix is <tt>/usr/local</tt> if not
+specified.
+<tag><tt>-</tt><tt>-with-tclconfig=</tt>DIR</tag> If Tcl is installed on
+the system you can tell configure in which directory Tcl's
+<tt>tclConfig.sh</tt> is stored. The <tt>tclConfig.sh</tt> include
+information about settings required to link with Tcl's libraries.
+If you don't specify this option, configure will see if Tcl's shell
+<tt>tclsh</tt> is in your path and if it is, it will guess where
+the equivalent tclConfig.sh is located. If tclsh is not found in
+your path and this option is not given Zebra will not include Tcl support.
+<tag><tt>-</tt><tt>-with-yazconfig=</tt>DIR</tag> This options allows you to
+specify the directory that contains YAZ's <tt>yaz-config</tt>.
+This options is useful if you wish to compile Zebra with a specific 
+version of YAZ. YAZ version 1.5 and later creates a script
+<tt>yaz-config</tt> that includes information on compiler settings
+needed to link with it.
+</descrip>
+
+When configured build the software by typing:
+<tscreen><verb>
+  make
+</verb></tscreen> 
+
+As an option you may type <tt>make depend</tt> to create
+source file dependencies for the package. This is only needed,
+however, if you modify the source code later.
+
  If successful, two executables have been created in the sub-directory
  If successful, two executables have been created in the sub-directory
-<tt/index/.
+<tt>bin</tt>.
  <descrip>
  <tag><tt>zebrasrv</tt></tag> The Z39.50 server and search engine.
  <tag><tt>zebraidx</tt></tag> The administrative tool for the search index.
  </descrip>
  
  <descrip>
  <tag><tt>zebrasrv</tt></tag> The Z39.50 server and search engine.
  <tag><tt>zebraidx</tt></tag> The administrative tool for the search index.
  </descrip>
  
-<sect>Quick Start
+<p>
+The next step is optional and is only needed if you wish to install
+zebra in system directories such as /usr/bin, /usr/lib, etc.
+
+To perform this step, type
+<tscreen><verb>
+  make install
+</verb></tscreen>
+
+The executables will be installed in prefix/bin, and profile
+tables will be installed in prefix/lib/zebra/tab. Here prefix
+represents the prefix as specified -- default being /usr/local.
+
+<sect1>WIN32
  
  <p>
  
  <p>
-This section will get you started quickly! We will try to index a few sample
-GILS records that are included with the Zebra distribution. Go to the
-<tt>test</tt> subdirectory. There you will find a configuration
+Zebra is shipped with "makefiles" for the NMAKE tool that comes
+with Visual C++.
+
+Start an MS-DOS prompt and switch the sub directory <tt>WIN</tt> where
+the file <tt>makefile</tt> is located. Customize the installation
+by editing the <tt>makefile</tt> file (for example by using wordpad).
+
+The following summarises the most important settings in that file.
+
+<descrip>
+<tag><tt>YAZDIR</tt></tag> Specifies where YAZ is located.
+<tag><tt>DEBUG</tt></tag> If set to 1, the software is
+compiled with debugging libraries. If set to 0, the software
+is compiled with release (non-debugging) libraries.
+<tag>BZIP2</tag> A group of settings (<tt>BZIP2LIB</tt>,..)
+that must be defined if BZIP2 compression support is desired.
+</descrip>
+
+When satisfied with the settings in the makefile type
+<tscreen><verb>
+nmake
+</verb></tscreen>
+
+If compilation was successful the executables <tt>zebraidx.exe</tt>
+and <tt>zebrasrv.exe</tt> are put in the sub directory <tt>BIN</tt>.
+
+<sect>Quick Start 
+<p>
+In this section, we will test the system by indexing a small set of sample
+GILS records that are included with the software distribution. Go to the
+<tt>test/gils</tt> subdirectory of the distribution archive. There you will
+find a configuration
  file named <tt>zebra.cfg</tt> with the following contents:
  <tscreen><verb>
  file named <tt>zebra.cfg</tt> with the following contents:
  <tscreen><verb>
-# Where are the YAZ tables located.
-profilePath: /usr/local/yaz
+# Where the schema files, attribute files, etc. are located.
+profilePath: .:../../tab:../../../yaz/tab 
  
  # Files that describe the attribute sets supported.
  
  # Files that describe the attribute sets supported.
+attset: explain.att
  attset: bib1.att
  attset: gils.att
  </verb></tscreen>
  
  Now, edit the file and set <tt>profilePath</tt> to the path of the
  attset: bib1.att
  attset: gils.att
  </verb></tscreen>
  
  Now, edit the file and set <tt>profilePath</tt> to the path of the
-YAZ profile tables (sub directory <tt>tab</tt> of YAZ).
+YAZ profile tables (sub directory <tt>tab</tt> of the YAZ distribution
+archive).
  
  The 48 test records are located in the sub directory <tt>records</tt>.
  To index these, type:
  <tscreen><verb>
  
  The 48 test records are located in the sub directory <tt>records</tt>.
  To index these, type:
  <tscreen><verb>
-$ ../index/zebraidx -t grs update records
+$ ../../bin/zebraidx -t grs.sgml update records
  </verb></tscreen>
  
  In the command above the option <tt>-t</tt> specified the record
  </verb></tscreen>
  
  In the command above the option <tt>-t</tt> specified the record
-type &mdash; in this case <tt>grs</tt>. The word <tt>update</tt> followed
+type &mdash; in this case <tt>grs.sgml</tt>. The word <tt>update</tt> followed
  by a directory root updates all files below that directory node.
  
  If your indexing command was successful, you are now ready to
  fire up a server. To start a server on port 2100, type:
  <tscreen><verb>
  by a directory root updates all files below that directory node.
  
  If your indexing command was successful, you are now ready to
  fire up a server. To start a server on port 2100, type:
  <tscreen><verb>
-$ ../index/zebrasrv tcp:@:2100
+$ ../../bin/zebrasrv tcp:@:2100
  </verb></tscreen>
  
  </verb></tscreen>
  
-The Zebra index that you've just made has one database called Default. It will
-return either USMARC, GRS-1, or SUTRS depending on what your client asks
-for.
+The Zebra index that you have just created has a single database
+named <tt/Default/. The database contains records structured according to
+the GILS profile, and the server will
+return records in either either XML, USMARC, GRS-1, or SUTRS depending
+on what your client asks for.
  
  To test the server, you can use any Z39.50 client (1992 or later). For
  instance, you can use the demo client that comes with YAZ: Just cd to
  the <tt/client/ subdirectory of the YAZ distribution and type:
  
  <tscreen><verb>
  
  To test the server, you can use any Z39.50 client (1992 or later). For
  instance, you can use the demo client that comes with YAZ: Just cd to
  the <tt/client/ subdirectory of the YAZ distribution and type:
  
  <tscreen><verb>
-$ client tcp:localhost:2100
+$ ./yaz-client tcp:localhost:2100
  </verb></tscreen>
  
  When the client has connected, you can type:
  </verb></tscreen>
  
  When the client has connected, you can type:
@@ -285,37 +349,25 @@ Z>format sutrs
  Z>show 1
  Z>format grs-1
  Z>show 1
  Z>show 1
  Z>format grs-1
  Z>show 1
+Z>format xml
+Z>show 1
+Z>elements B
+Z>show 1
  </verb></tscreen>
  
  </verb></tscreen>
  
-If you've made it this far, there's a reasonably good chance that
+<it>NOTE: You may notice that more fields are returned when your
+client requests SUTRS or GRS-1 records. When retrieving GILS records,
+this is normal - not all of the GILS data elements have mappings in
+the USMARC record format.</it>
+
+If you've made it this far, there's a good chance that
  you've got through the compilation OK.
  
  <sect>Administrating Zebra<label id="administrating">
  
  <p>
  you've got through the compilation OK.
  
  <sect>Administrating Zebra<label id="administrating">
  
  <p>
-Unlike many simpler retrieval systems, Zebra supports safe, incremental
-updates to an existing index.
-
-Normally, when Zebra modifies the index it reads a number of records
-that you specify.
-Depending on your specifications and on the contents of each record
-one the following events take place for each record:
-<descrip>
-<tag>Insert</tag> The record is indexed as if it never occurred
-before. Either the Zebra system doesn't know how to identify the record or
-Zebra can identify the record but didn't find it to be already indexed.
-<tag>Modify</tag> The record has already been indexed. In this case
-either the contents of the record or the location (file) of the record
-indicates that it has been indexed before.
-<tag>Delete</tag> The record is deleted from the index. As in the
-update-case it must be able to identify the record.
-</descrip>
-
-Please note that in both the modify- and delete- case the Zebra
-indexer must be able to generate a unique key that identifies the record in
-question (more on this below).
  
  
-To administrate the Zebra retrieval system, you run the
+To administrate Zebra, you run the
  <tt>zebraidx</tt> program. This program supports a number of options
  which are preceded by a minus, and a few commands (not preceded by
  minus).
  <tt>zebraidx</tt> program. This program supports a number of options
  which are preceded by a minus, and a few commands (not preceded by
  minus).
@@ -326,18 +378,17 @@ name of the configuration file defaults to <tt>zebra.cfg</tt>.
  The configuration file includes specifications on how to index
  various kinds of records and where the other configuration files
  are located. <tt>zebrasrv</tt> and <tt>zebraidx</tt> <em>must</em>
  The configuration file includes specifications on how to index
  various kinds of records and where the other configuration files
  are located. <tt>zebrasrv</tt> and <tt>zebraidx</tt> <em>must</em>
-be run in the same directory where the configuration file if you do
-not indicate the location of the configuration file by option
+be run in the directory where the configuration file lives unless you
+indicate the location of the configuration file by option
  <tt>-c</tt>.
  
  <sect1>Record Types<label id="record-types">
  <p>
  <tt>-c</tt>.
  
  <sect1>Record Types<label id="record-types">
  <p>
-Indexing is a per-record process, in which
-either insert/modify/delete will occur. Before a record is indexed
-search keys are extracted from whatever might be the layout the
-original record (sgml,html,text, etc..). The Zebra system 
-currently only supports SGML-like, structured records and unstructured text
-records.
+Indexing is a per-record process. Before a record is indexed search
+keys are extracted from whatever might be the layout the original
+record (sgml,html,text, etc..).
+The Zebra system currently supports two fundamantal types of records:
+structured and simple text.
  To specify a particular extraction process, use either the
  command line option <tt>-t</tt> or specify a
  <tt>recordType</tt> setting in the configuration file.
  To specify a particular extraction process, use either the
  command line option <tt>-t</tt> or specify a
  <tt>recordType</tt> setting in the configuration file.
@@ -352,21 +403,22 @@ You can edit the configuration file with a normal text editor.
  Parameter names and values are seperated by colons in the file. Lines
  starting with a hash sign (<tt/&num;/) are treated as comments.
  
  Parameter names and values are seperated by colons in the file. Lines
  starting with a hash sign (<tt/&num;/) are treated as comments.
  
-If you manage different sets of records that each share common
-caracteristics, you can organize the configuration settings for each
+If you manage different sets of records that share common
+characteristics, you can organize the configuration settings for each
  type into &dquot;groups&dquot;.
  When <tt>zebraidx</tt> is run and you wish to address a given group
  type into &dquot;groups&dquot;.
  When <tt>zebraidx</tt> is run and you wish to address a given group
-you specify that group with the <tt>-g</tt> option. In this case
+you specify the group name with the <tt>-g</tt> option. In this case
  settings that have the group name as their prefix will be used
  settings that have the group name as their prefix will be used
-by <tt>zebraidx</tt> and not default values. The default values have no prefix.
+by <tt>zebraidx</tt>. If no <tt/-g/ option is specified, the settings
+with no prefix are used.
  
  
-The group is written before the option itself, separated by a dot (.).
-For instance, to set the record type for group <tt/public/ to <tt/grs/
-(the common format for structured records)
-you would write:
+In the configuration file, the group name is placed before the option
+name itself, separated by a dot (.). For instance, to set the record type
+for group <tt/public/ to <tt/grs.sgml/ (the SGML-like format for structured
+records) you would write:
  
  <tscreen><verb>
  
  <tscreen><verb>
-public.recordType: grs
+public.recordType: grs.sgml
  </verb></tscreen>
  
  To set the default value of the record type to <tt/text/ write:
  </verb></tscreen>
  
  To set the default value of the record type to <tt/text/ write:
@@ -375,359 +427,86 @@ To set the default value of the record type to <tt/text/ write:
  recordType: text
  </verb></tscreen>
  
  recordType: text
  </verb></tscreen>
  
-The configuration settings are summarized below. They will be
+The available configuration settings are summarized below. They will be
  explained further in the following sections.
  
  <descrip>
  explained further in the following sections.
  
  <descrip>
-<tag><it>group</it>recordType<it>name</it></tag>
+<tag><it>group</it>.recordType&lsqb;<it>.name</it>&rsqb;</tag>
   Specifies how records with the file extension <it>name</it> should
   be handled by the indexer. This option may also be specified
   as a command line option (<tt>-t</tt>). Note that if you do not
   Specifies how records with the file extension <it>name</it> should
   be handled by the indexer. This option may also be specified
   as a command line option (<tt>-t</tt>). Note that if you do not
- specify a <it/name/, the setting applies to all files.
-<tag><it>group</it>recordId</tag>
- Specifies how the record is to be identified when updated.
-<tag><it>group</it>database</tag>
+ specify a <it/name/, the setting applies to all files. In general,
+ the record type specifier consists of the elements (each
+ element separated by dot), <it>fundamental-type</it>,
+ <it>file-read-type</it> and arguments. Currently, two
+ fundamental types exist, <tt>text</tt> and <tt>grs</tt>.
+ <tag><it>group</it>.recordId</tag>
+ Specifies how the records are to be identified when updated. See
+section <ref id="locating-records" name="Locating Records">.
+<tag><it>group</it>.database</tag>
   Specifies the Z39.50 database name.
   Specifies the Z39.50 database name.
-<tag><it>group</it>storeKeys</tag>
+<tag><it>group</it>.storeKeys</tag>
   Specifies whether key information should be saved for a given
   group of records. If you plan to update/delete this type of
   records later this should be specified as 1; otherwise it
   Specifies whether key information should be saved for a given
   group of records. If you plan to update/delete this type of
   records later this should be specified as 1; otherwise it
- should be 0 (default).
-<tag><it>group</it>storeData</tag>
+ should be 0 (default), to save register space.
+<tag><it>group</it>.storeData</tag>
   Specifies whether the records should be stored internally
   in the Zebra system files. If you want to maintain the raw records yourself,
   this option should be false (0). If you want Zebra to take care of the records
   for you, it should be true(1).
   Specifies whether the records should be stored internally
   in the Zebra system files. If you want to maintain the raw records yourself,
   this option should be false (0). If you want Zebra to take care of the records
   for you, it should be true(1).
-<tag>register</tag> 
- Specifies the location of the various files that Zebra uses to represent
- your system.
-<tag>tempSetPath</tag>
+<tag>lockDir</tag>
+ Directory in which various lock files are stored.
+<tag>keyTmpDir</tag>
+ Directory in which temporary files used during zebraidx' update
+ phase are stored. 
+<tag>setTmpDir</tag>
   Specifies the directory that the server uses for temporary result sets.
   If not specified <tt>/tmp</tt> will be used.
  <tag>profilePath</tag>
   Specifies the directory that the server uses for temporary result sets.
   If not specified <tt>/tmp</tt> will be used.
  <tag>profilePath</tag>
- Specifies the location of profile specification paths.
+ Specifies the location of profile specification files.
  <tag>attset</tag> 
   Specifies the filename(s) of attribute set files for use in
   searching. At least the Bib-1 set should be loaded (<tt/bib1.att/).
  <tag>attset</tag> 
   Specifies the filename(s) of attribute set files for use in
   searching. At least the Bib-1 set should be loaded (<tt/bib1.att/).
- The <tt/profilePath/ setting is used to search for attribute set
- files.
+ The <tt/profilePath/ setting is used to look for the specified files.
+ See section <ref id="attset-files" name="The Attribute Set Files">
+<tag>memMax</tag>
+ Specifies size of internal memory to use for the zebraidx program. The
+ amount is given in megabytes - default is 4 (4 MB).
  </descrip>
  </descrip>
-
-<sect1>Locating Records
+<sect1>Locating Records<label id="locating-records">
  <p>
  The default behaviour of the Zebra system is to reference the
  records from their original location, i.e. where they were found when you
  <p>
  The default behaviour of the Zebra system is to reference the
  records from their original location, i.e. where they were found when you
-ran <tt/zebraidx/.
-
-If your input files are temporary - for example if you retrieve
-your records from an outside source, or if they where temporarily mounted on a CD-ROM,
+ran <tt/zebraidx/. That is, when a client wishes to retrieve a record
+following a search operation, the files are accessed from the place
+where you originally put them - if you remove the files (without
+running <tt/zebraidx/ again, the client will receive a diagnostic
+message.
+
+If your input files are not permanent - for example if you retrieve
+your records from an outside source, or if they were temporarily
+mounted on a CD-ROM drive,
  you may want Zebra to make an internal copy of them. To do this,
  you may want Zebra to make an internal copy of them. To do this,
-you specify 1 (true) in the <tt>storedata</tt> setting. When
+you specify 1 (true) in the <tt>storeData</tt> setting. When
  the Z39.50 server retrieves the records they will be read from the
  internal file structures of the system.
  
  the Z39.50 server retrieves the records they will be read from the
  internal file structures of the system.
  
-<sect1>Indexing with no Record IDs (Simple Indexing)
+<sect1>Indexing example
  
  <p>
  
  <p>
-If you have a set of records that you <em/never/ wish to delete
-or modify you may find &dquot;indexing without records IDs&dquot; convenient.
-This indexing method uses less space than the other methods and
-is simple to use. 
-
-To use this method, you simply don't provide the <tt>recordId</tt> entry
-for the group of files that you index. To add a set of records you use
-<tt>zebraidx</tt> with the <tt>update</tt> command. The
-<tt>update</tt> command will always add all of the records to the index
-because Zebra doesn't know how to match the new set of records with
-existing records.
-
  Consider a system in which you have a group of text files called
  <tt>simple</tt>. That group of records should belong to a Z39.50 database
  called <tt>textbase</tt>. The following <tt/zebra.cfg/ file will suffice:
  
  <tscreen><verb>
  Consider a system in which you have a group of text files called
  <tt>simple</tt>. That group of records should belong to a Z39.50 database
  called <tt>textbase</tt>. The following <tt/zebra.cfg/ file will suffice:
  
  <tscreen><verb>
-profilePath: /usr/local/yaz
+profilePath: /usr/lib/yaz/tab:/usr/lib/zebra/tab
+attset: explain.att
  attset: bib1.att
  simple.recordType: text
  simple.database: textbase
  </verb></tscreen>
  
  attset: bib1.att
  simple.recordType: text
  simple.database: textbase
  </verb></tscreen>
  
-Since the existing records in an index can not be addressed by their
-IDs, it is impossible to delete or modify records when using this method.
-
-<sect1>Indexing with File Record IDs
-
-<p>
-If you have a set of external records that you wish to index you may
-use the file key feature of the Zebra system. In short, the file key
-methodology uses the paths of the files containing records as their
-unique identifiers. To perform indexing of a directory with file keys,
-again, you specify the top-level directory after the <tt>update</tt>
-command. The command will recursively traverse the directories and
-compare each with whatever have been indexed before in the same
-directory. If a file is new (not in the previous version of the
-directory) it is inserted into the registers; if a file was already
-indexed and it has been modified since the last insertionm, the index
-is also modified; if a file has been removed since the last visit, it
-is deleted from the index.
-
-The resulting system is easy to administer. To delete a record
-you simply have to delete the corresponding file (say, with the
-<tt/rm/ command). 
-To force update of a given file, you may use the <tt>touch</tt>
-command. And to add files create new files (or directories with files).
-For your changes to take effect in the register you must run <tt>zebraidx</tt> with
-the same directory root again.
-
-To use this method, you must specify <tt>file</tt> as the value
-of <tt>recordId</tt> in the configuration file. In addition, you
-should set <tt>storeKeys</tt> to <tt>1</tt>, since the Zebra
-indexer must save additional information about the keys to each record in order to
-modify the indices correctly at a later time.
-
-For example, to update group <tt>esdd</tt> records below
-<tt>/home/grs</tt> you could type:
-<tscreen><verb>
-$ zebraidx -g esdd update /home/grs
-</verb></tscreen>
-
-The corresponding configuration file includes:
-<tscreen><verb>
-esdd.recordId: file
-esdd.recordType: grs
-esdd.storeKeys: 1
-</verb></tscreen>
-
-<em>Important note: You cannot start out with a group of records with simple
-indexing (no record IDs as in the previous section) and then later
-enable file record Ids. Zebra must know from the first time that you
-index the group that
-the files should be indexed with file record IDs.
-</em>
-
-You cannot explicitly delete records when using this method (using the
-<bf/delete/ command to <tt/zebraidx/. Instead
-you have to delete the files from the file system (or remove them)
-and then run <tt>zebraidx</tt> with the <bf/update/ command again.
-
-<sect1>Indexing with General Record IDs
-<p>
-When using this method you construct an (almost) arbritrary, internal
-record key based on the contents of the record itself and other system
-information. If you have a group of records that associates an ID with
-each record, this method is convenient. For example, the record may
-contain a title or a ID-number - unique within the group. In either
-case you specify the Z39.50 attribute set and use-attribute location
-in which this information is stored, and the system looks at this
-field to determine the identity of the record.
-
-As before, the record ID is defined by the <tt>recordId</tt> setting
-in the configuration file. The value of the record ID specification
-consists of one or more tokens separated by whitespace. The resulting
-ID is
-represented in the index by concatenating the tokens and separating them by
-ASCII value (1).
-
-There are three kinds of tokens:
-<descrip>
-<tag>Internal record info</tag> The token refers to a key that is
-extracted from the record. The syntax of this token is
- <tt/(/ <em/set/ <tt/,/ <em/use/ <tt/)/, where <em/set/ is the
-attribute set ordinal number and <em/use/ is the use value of the attribute.
-<tag>System variable</tag> The system variables are preceded by
-<verb>$</verb> and immediately followed by the system variable name, which
-may one of
- <descrip>
- <tag>group</tag> Group name.
- <tag>database</tag> Current database specified.
- <tag>type</tag> Record type.
- </descrip>
-<tag>Constant string</tag> A string used as part of the ID &mdash; surrounded
- by single- or double quotes.
-</descrip>
-
-The sample GILS records that come with the Zebra distribution contain a
-unique ID
-in the Control-Identifier field. This field is mapped to the Bib-1
-use attribute 1007. To use this field as a record id, specify
-<tt>(1,1007)</tt> as the value of the <tt>recordId</tt> in the
-configuration file. If you have other record types that uses
-the same field for a different purpose, you might add the record type (or group or database name)
-to the record id of the gils records as well, to prevent matches
-with other types of records. In this case the recordId might be
-set like this:
-<tscreen><verb>
-gils.recordId: $type (1,1007)
-</verb></tscreen>
-
-As for the file record id case described in the previous section
-updating your system is simply a matter of running <tt>zebraidx</tt>
-with the <tt>update</tt> command. However, the update with general
-keys is considerably slower than with file record IDs, since all files
-visited must be (re)read to find their IDs. 
-
-You may have noticed that when using the general record IDs
-method, you can only add or modify existing records with the <tt>update</tt>
-command. If you wish to delete records, you must use the,
-<tt>delete</tt> command, with a directory as a parameter.
-This will remove all records that match the files below that root
-directory.
-
-<sect1>Register Location<label id="register-location">
-
-<p>
-Normally, the index files that form dictionaries, inverted
-files, record info, etc., are stored in the directory where you run
-<tt>zebraidx</tt>. If you wish to store these, possibly large, files
-somewhere else, you must add the <tt>register</tt> entry to the
-configuration file. Furthermore, the Zebra system allows its file
-structures to
-span multiple file systems, which is useful if a very large number of
-records are stored.
-
-The value <tt>register</tt> of register is a sequence of tokens.
-Each token takes the form:
-<tscreen>
-<em>dir</em><tt>:</tt><em>size</em>. 
-</tscreen>
-The <em>dir</em> specifies a directory in which index files will be
-stored and the <em>size</em> specifies the maximum size of all
-files in that directory. The Zebra indexer system fills each directory
-in the order specified and use the next specified directories as needed.
-The <em>size</em> is an integer followed by a qualifier
-code, <tt>M</tt> for megabytes, <tt>k</tt> for kilobytes.
-
-For instance, if you have two spare disks :) and the first disk is mounted
-on <tt>/d1</tt> and has 200 Mb of free space and the
-second, mounted on <tt>/d2</tt> has 300 Mb, you could
-put this entry in your configuration file:
-<tscreen><verb>
-register: /d1:200M /d2:300M
-</verb></tscreen>
-
-Note that Zebra does not verify that the amount of space specified is
-actually available on the directory (file system) specified - it is
-your responsibility to ensure that enough space is available, and that
-other applications do not use the free space. In a large production system,
-it is recommended that you allocate one or more filesystem exclusively
-to the Zebra register files.
-
-<sect1>Safe Updating - Using Shadow Registers<label id="shadow-registers">
-
-<sect2>Description
-
-<p>
-The Zebra server supports updating of the index structures. That is,
-you can add records to databases managed by Zebra without rebuilding
-the entire index. Since this process involves modifying structured
-files with various references between blocks of data in the files, the
-update process is inherently sensitive to system crashes, or to
-process interruptions: Anything but a successfully completed update
-process will leave the register files in an unknown state, and you
-will essentially have no recourse but to re-index everything, or to
-restore the register files from a backup medium. Further, while the
-update process is active, users cannot be allowed to access the
-system, as the contents of the register files may change unpredictably.
-
-You can solve these problems by enabling the shadow register system in
-Zebra. During the updating procedure, <tt/zebraidx/ will temporarily
-write changes to the involved files in a set of &dquot;shadow
-files&dquot;, without modifying the files that are accessed by the
-active server processes. If the update procedure is interrupted by a
-system crash or a signal, you simply repeat the procedure - the
-register files have not been changed or damaged, and the partially
-written shadow files are automatically deleted before the new updating
-procedure commences.
-
-At the end of the updating procedure (or in a separate operation, if
-you so desire), the system enters a &dquot;commit mode&dquot;. First,
-any active server processes are forced to access those blocks that
-have been changed from the shadow files rather than from the main
-register files; the unmodified blocks are still accessed at their
-normal location (the shadow files are not a complete copy of the
-register files - they only contain those parts that have actually been
-modified). If the process is interrupted at any point during the
-commit process, the server processes will continue to access the
-shadow files until you can repeat the commit procedure and complete
-the writing of data to the main register files. You can perform
-multiple update operations to the registers before you commit the
-changes to the system files, or you can execute the commit operation
-at the end of each update operation. When the commit phase has
-completed successfully, any running server processes are instructed to
-switch their operations to the new, operational register, and the
-temporary shadow files are deleted.
-
-<sect2>How to Use Shadow Register Files
-
-<p>
-The first step is to allocate space on your system for the shadow
-files. You do this by adding a <tt/shadow/ entry to the <tt/zebra.cfg/
-file. The syntax of the <tt/shadow/ entry is exactly the same as for
-the <tt/register/ entry (see section <ref name="Register Location"
-id="register-location">). The location of the shadow area should be
-<it/different/ from the location of the main register area (if you
-have specified one - remember that the default register area is the
-working directory of the server and indexing processes).
-
-The following excerpt from a <tt/zebra.cfg/ file shows one example of
-a setup that configures both the main register location and the shadow
-file area. Note that two directories or partitions have been set aside
-for the shadow file area. You can specify any number of directories
-for each of the file areas.
-
-<tscreen><verb>
-register: /d1:500M
-
-shadow: /scratch1:100M /scratch2:200M
-</verb></tscreen>
-
-When shadow files are enabled, an extra command is available at the
-<tt/zebraidx/ command line. In order to make changes to the system
-take effect for the users, you'll have to submit a
-&dquot;commit&dquot; command after a (sequence of) update
-operation(s). You can ask the indexer to commit the changes
-immediately after the update operation:
-
-<tscreen><verb>
-$ zebraidx update /d1/records update /d2/more-records commit
-</verb></tscreen>
-
-Or you can execute multiple updates before committing the changes:
-
-<tscreen><verb>
-$ zebraidx -g books update /d1/records update /d2/more-records
-$ zebraidx -g fun update /d3/fun-records
-$ zebraidx commit
-</verb></tscreen>
-
-If one of the update operations above had been interrupted, the commit
-operation on the last line would fail: <tt/zebraidx/ will not let you
-commit changes that would destroy the running register. You'll have to
-rerun all of the update operations since your last commit operation,
-before you can commit the new changes.
-
-Similarly, if the commit operation fails, <tt/zebraidx/ will not let
-you start a new update operation before you have successfully repeated
-the commit operation. The server processes will keep accessing the
-shadow files rather than the (possibly damaged) blocks of the main
-register files until the commit operation has successfully completed.
-
-You should be aware that update operations may take slightly longer
-when the shadow register system is enabled, since more file access
-operations are involved. Further, while the disk space required for
-the shadow register data is modest for a small update operation, you
-may prefer to disable the system if you are adding a very large number
-of records to an already very large database (we use the terms
-<it/large/ and <it/modest/ very loosely here, since every
-application's perception of size is different). To update the system
-without the use of the the shadow files, simply run <tt/zebraidx/ with
-the <tt/-n/ option (note that you do not have to execute the
-<bf/commit/ command of <tt/zebraidx/ when you temporarily disable the
-use of the shadow registers in this fashion. Note also that, just as
-when the shadow registers are not enabled, server processes will be
-barred from accessing the main register while the update procedure
-takes place.
-
  <sect>Running the Maintenance Interface (zebraidx)
  
  <p>
  <sect>Running the Maintenance Interface (zebraidx)
  
  <p>
@@ -741,12 +520,13 @@ $ zebraidx &lsqb;options&rsqb; command &lsqb;directory&rsqb; ...
  <bf/Options/
  <descrip>
  <tag>-t <it/type/</tag>Update all files as <it/type/. Currently, the
  <bf/Options/
  <descrip>
  <tag>-t <it/type/</tag>Update all files as <it/type/. Currently, the
-types supported are <tt/text/ and <tt/grs/<it/.filter/. If no
-<it/filter/ is provided for the GRS (General Record Structure) type,
+types supported are <tt/text/ and <tt/grs/<it/.subtype/. If no
+<it/subtype/ is provided for the GRS (General Record Structure) type,
  the canonical input format is assumed (see section <ref
  id="local-representation" name="Local Representation">). Generally, it
  is probably advisable to specify the record types in the
  the canonical input format is assumed (see section <ref
  id="local-representation" name="Local Representation">). Generally, it
  is probably advisable to specify the record types in the
-<tt/zebra.cfg/ file (see section <ref id="record-types" name="Record Types">).
+<tt/zebra.cfg/ file (see section <ref id="record-types" name="Record
+Types">), to avoid confusion at subsequent updates.
  
  <tag>-c <it/config-file/</tag>Read the configuration file
  <it/config-file/ instead of <tt/zebra.cfg/.
  
  <tag>-c <it/config-file/</tag>Read the configuration file
  <it/config-file/ instead of <tt/zebra.cfg/.
@@ -759,13 +539,16 @@ name="The Zebra Configuration File">).
  with the database name <it/database/ for access through the Z39.50
  server.
  
  with the database name <it/database/ for access through the Z39.50
  server.
  
-<tag>-d <it/mbytes/</tag>Use <it/mbytes/ of megabytes before flushing
+<tag>-m <it/mbytes/</tag>Use <it/mbytes/ of megabytes before flushing
  keys to background storage. This setting affects performance when
  updating large databases.
  
  keys to background storage. This setting affects performance when
  updating large databases.
  
-<tag>-n</tag>Disable the use of shadow registers for this operation
-(see section <ref id="shadow-registers" name="Robust Updating - Using
-Shadow Registers">).
+<tag>-s</tag>Show analysis of the indexing process. The maintenance
+program works in a read-only mode and doesn't change the state
+of the index. This options is very useful when you wish to test a
+new profile.
+
+<tag>-V</tag>Show Zebra version.
  
  <tag>-v <it/level/</tag>Set the log level to <it/level/. <it/level/
  should be one of <tt/none/, <tt/debug/, and <tt/all/.
  
  <tag>-v <it/level/</tag>Set the log level to <it/level/. <it/level/
  should be one of <tt/none/, <tt/debug/, and <tt/all/.
@@ -779,18 +562,11 @@ contained in <it/directory/. If no directory is provided, a list of
  files is read from <tt/stdin/. See section <ref
  id="administrating" name="Administrating Zebra">.
  
  files is read from <tt/stdin/. See section <ref
  id="administrating" name="Administrating Zebra">.
  
-<tag>Delete <it/directory/</tag>Remove the records corresponding to
-the files found under <it/directory/ from the register.
-
-<tag/Commit/Write the changes resulting from the last <bf/update/
-commands to the register. This command is only available if the use of
-shadow register files is enabled (see section <ref
-id="shadow-registers" name="Robust Updating - Using Shadow
-Registers">).
-
  </descrip>
  
  </descrip>
  
-<sect>Running the Z39.50 Server (zebrasrv)
+<sect>The Z39.50 Server
+
+<sect1>Running the Z39.50 Server (zebrasrv)
  
  <p>
  <bf/Syntax/
  
  <p>
  <bf/Syntax/
@@ -809,14 +585,6 @@ The special name &dquot;-&dquot; sends output to <tt/stderr/.
  symbolic-level debugging. The server can only accept a single
  connection in this mode.
  
  symbolic-level debugging. The server can only accept a single
  connection in this mode.
  
-<tag/-s/Use the SR protocol.
-
-<tag/-z/Use the Z39.50 protocol (default). These two options complement
-eachother. You can use both multiple times on the same command
-line, between listener-specifications (see below). This way, you
-can set up the server to listen for connections in both protocols
-concurrently, on different local ports.
-
  <tag>-l <it/logfile/</tag>Specify an output file for the diagnostic
  messages. The default is to write this information to <tt/stderr/.
  
  <tag>-l <it/logfile/</tag>Specify an output file for the diagnostic
  messages. The default is to write this information to <tt/stderr/.
  
@@ -830,7 +598,14 @@ privileged port.
  
  <tag>-w <it/working-directory/</tag>Change working directory.
  
  
  <tag>-w <it/working-directory/</tag>Change working directory.
  
-<tag/-i/Run under the Internet superserver, <tt/inetd/.
+<tag>-i</tag>Run under the Internet superserver, <tt/inetd/. Make
+sure you use the logfile option <tt/-l/ in conjunction with this
+mode and specify the <tt/-l/ option before any other options.
+
+<tag>-t <it/timeout/</tag>Set the idle session timeout (default 60 minutes).
+
+<tag>-k <it/kilobytes/</tag>Set the (approximate) maximum size of
+present response messages. Default is 1024 Kb (1 Mb).
  </descrip>
  
  A <it/listener-address/ consists of a transport mode followed by a
  </descrip>
  
  A <it/listener-address/ consists of a transport mode followed by a
@@ -845,46 +620,214 @@ hostname | IP-number &lsqb;: portnumber&rsqb;
  
  The port number defaults to 210 (standard Z39.50 port).
  
  
  The port number defaults to 210 (standard Z39.50 port).
  
-For OSI (only available if the server is compiled with XTI/mOSI
-support enabled), the address form is
+The special hostname &dquot;@&dquot; is mapped to
+the address INADDR_ANY, which causes the server to listen on any local
+interface. To start the server listening on the registered port for
+Z39.50, and to drop root privileges once the
+port is bound, execute the server like this (from a root shell):
  
  <tscreen><verb>
  
  <tscreen><verb>
-&lsqb;t-selector /&rsqb; hostname | IP-number &lsqb;: portnumber&rsqb;
+zebrasrv -u daemon tcp:@
  </verb></tscreen>
  
  </verb></tscreen>
  
-The transport selector is given as a string of hex digits (with an even
-number of digits). The default port number is 102 (RFC1006 port).
+You can replace <tt/daemon/ with another user, eg. your own account, or
+a dedicated IR server account.
  
  
-Examples
+The default behavior for <tt/zebrasrv/ is to establish a single TCP/IP
+listener, for the Z39.50 protocol, on port 9999.
  
  
-<tscreen>
+<sect1>Z39.50 Protocol Support and Behavior
+
+<sect2>Initialization
+
+<p>
+During initialization, the server will negotiate to version 3 of the
+Z39.50 protocol (unless the client specifies a lower version), and the option bits for Search, Present, Scan,
+NamedResultSets, and concurrentOperations will be set, if requested by
+the client. The maximum PDU size is negotiated down to a maximum of
+1Mb by default.
+
+<sect2>Search<label id="search">
+
+<p>
+The supported query type are 1 and 101. All operators are currently
+supported with the restriction that only proximity units of type "word" are
+supported for the proximity operator.
+Queries can be arbitrarily complex.
+Named result sets are supported, and result sets can be used as operands
+without limitations.
+Searches may span multiple databases.
+
+The server has full support for piggy-backed present requests (see
+also the following section).
+
+<bf/Use/ attributes are interpreted according to the attribute sets which
+have been loaded in the <tt/zebra.cfg/ file, and are matched against
+specific fields as specified in the <tt/.abs/ file which describes the
+profile of the records which have been loaded. If no <bf/Use/
+attribute is provided, a default of Bib-1 <bf/Any/ is assumed.
+
+If a <bf/Structure/ attribute of <bf/Phrase/ is used in conjunction with a
+<bf/Completeness/ attribute of <bf/Complete (Sub)field/, the term is
+matched against the contents of the phrase (long word) register, if one
+exists for the given <bf/Use/ attribute.
+A phrase register is created for those fields in the <tt/.abs/
+file that contains a <tt/p/-specifier.
+
+If <bf/Structure/=<bf/Phrase/ is used in conjunction with
+<bf/Incomplete Field/ - the default value for <bf/Completeness/, the
+search is directed against the normal word registers, but if the term
+contains multiple words, the term will only match if all of the words
+are found immediately adjacent, and in the given order.
+The word search is performed on those fields that are indexed as
+type <tt/w/ in the <tt/.abs/ file.
+
+If the <bf/Structure/ attribute is <bf/Word List/,
+<bf/Free-form Text/, or <bf/Document Text/, the term is treated as a
+natural-language, relevance-ranked query.
+This search type uses the word register, i.e. those fields
+that are indexed as type <tt/w/ in the <tt/.abs/ file.
+
+If the <bf/Structure/ attribute is <bf/Numeric String/ the
+term is treated as an integer. The search is performed on those
+fields that are indexed as type <tt/n/ in the <tt/.abs/ file.
+
+If the <bf/Structure/ attribute is <bf/URx/ the
+term is treated as a URX (URL) entity. The search is performed on those
+fields that are indexed as type <tt/u/ in the <tt/.abs/ file.
+
+If the <bf/Structure/ attribute is <bf/Local Number/ the
+term is treated as native Zebra Record Identifier.
+
+If the <bf/Relation/ attribute is <bf/Equals/ (default), the term is
+matched in a normal fashion (modulo truncation and processing of
+individual words, if required). If <bf/Relation/ is <bf/Less Than/,
+<bf/Less Than or Equal/, <bf/Greater than/, or <bf/Greater than or
+Equal/, the term is assumed to be numerical, and a standard regular
+expression is constructed to match the given expression. If
+<bf/Relation/ is <bf/Relevance/, the standard natural-language query
+processor is invoked.
+
+For the <bf/Truncation/ attribute, <bf/No Truncation/ is the default.
+<bf/Left Truncation/ is not supported. <bf/Process &num;/ is supported, as
+is <bf/Regxp-1/. <bf/Regxp-2/ enables the fault-tolerant (fuzzy)
+search. As a default, a single error (deletion, insertion,
+replacement) is accepted when terms are matched against the register
+contents.
+
+<sect3>Regular expressions
+<p>
+
+Each term in a query is interpreted as a regular expression if
+the truncation value is either <bf/Regxp-1/ (102) or <bf/Regxp-2/ (103).
+Both query types follow the same syntax with the operands:
+<descrip>
+<tag/x/ Matches the character <it/x/.
+<tag/./ Matches any character.
+<tag><tt/[/..<tt/]/</tag> Matches the set of characters specified;
+ such as <tt/[abc]/ or <tt/[a-c]/.
+</descrip>
+and the operators:
+<descrip>
+<tag/x*/ Matches <it/x/ zero or more times. Priority: high.
+<tag/x+/ Matches <it/x/ one or more times. Priority: high.
+<tag/x?/ Matches <it/x/ once or twice. Priority: high.
+<tag/xy/ Matches <it/x/, then <it/y/. Priority: medium.
+<tag/x|y/ Matches either <it/x/ or <it/y/. Priority: low.
+</descrip>
+The order of evaluation may be changed by using parentheses.
+
+If the first character of the <bf/Regxp-2/ query is a plus character
+(<tt/+/) it marks the beginning of a section with non-standard
+specifiers. The next plus character marks the end of the section.
+Currently Zebra only supports one specifier, the error tolerance,
+which consists one digit. 
+
+Since the plus operator is normally a suffix operator the addition to
+the query syntax doesn't violate the syntax for standard regular
+expressions.
+
+<sect3>Query examples
+<p>
+
+Phrase search for <bf/information retrieval/ in the title-register:
  <verb>
  <verb>
-tcp:dranet.dra.com
+ @attr 1=4 "information retrieval"
+</verb>
  
  
-osi:0402/dbserver.osiworld.com:3000
+Ranked search for the same thing:
+<verb>
+ @attr 1=4 @attr 2=102 "Information retrieval"
  </verb>
  </verb>
-</tscreen>
  
  
-In both cases, the special hostname &dquot;@&dquot; is mapped to
-the address INADDR_ANY, which causes the server to listen on any local
-interface. To start the server listening on the registered ports for
-Z39.50 and SR over OSI/RFC1006, and to drop root privileges once the
-ports are bound, execute the server like this (from a root shell):
+Phrase search with a regular expression:
+<verb>
+ @attr 1=4 @attr 5=102 "informat.* retrieval"
+</verb>
  
  
-<tscreen><verb>
-zebrasrv -u daemon tcp:@ -s osi:@
-</verb></tscreen>
+Ranked search with a regular expression:
+<verb>
+ @attr 1=4 @attr 5=102 @attr 2=102 "informat.* retrieval"
+</verb>
  
  
-You can replace <tt/daemon/ with another user, eg. your own account, or
-a dedicated IR server account.
+In the GILS schema (<tt/gils.abs/), the west-bounding-coordinate is
+indexed as type <tt/n/, and is therefore searched by specifying
+<bf/structure/=<bf/Numeric String/.
+To match all those records with west-bounding-coordinate greater
+than -114 we use the following query:
+<verb>
+ @attr 4=109 @attr 2=5 @attr gils 1=2038 -114
+</verb>
  
  
-The default behavior for <tt/zebrasrv/ is to establish a single TCP/IP
-listener, for the Z39.50 protocol, on port 9999.
+<sect2>Present
+<p>
+The present facility is supported in a standard fashion. The requested
+record syntax is matched against the ones supported by the profile of
+each record retrieved. If no record syntax is given, SUTRS is the
+default. The requested element set name, again, is matched against any
+provided by the relevant record profiles.
+
+<sect2>Scan
+
+<p>
+The attribute combinations provided with the TermListAndStartPoint are
+processed in the same way as operands in a query (see above).
+Currently, only the term and the globalOccurrences are returned with
+the TermInfo structure.
+
+<sect2>Sort
+
+<p>
+Z39.50 specifies three diffent types of sort criterias.
+Of these Zebra supports the attribute specification type in which
+case the use attribute specifies the "Sort register".
+Sort registers are created for those fields that are of type "sort" in
+the default.idx file. 
+The corresponding character mapping file in default.idx specifies the
+ordinal of each character used in the actual sort.
+
+Z39.50 allows the client to specify sorting on one or more input
+result sets and one output result set.
+Zebra supports sorting on one result set only which may or may not
+be the same as the output result set.
+
+<sect2>Close
+
+<p>
+If a Close PDU is received, the server will respond with a Close PDU
+with reason=FINISHED, no matter which protocol version was negotiated
+during initialization. If the protocol version is 3 or more, the
+server will generate a Close PDU under certain circumstances,
+including a session timeout (60 minutes by default), and certain kinds of
+protocol errors. Once a Close PDU has been sent, the protocol
+association is considered broken, and the transport connection will be
+closed immediately upon receipt of further data, or following a short
+timeout.
  
  <sect>The Record Model
  
  <p>
  
  <sect>The Record Model
  
  <p>
-The Zebra system is designed to span a wide range of data management
+Zebra is designed to support a wide range of data management
  applications. The system can be configured to handle virtually any
  kind of structured data. Each record in the system is associated with
  a <it/record schema/ which lends context to the data elements of the
  applications. The system can be configured to handle virtually any
  kind of structured data. Each record in the system is associated with
  a <it/record schema/ which lends context to the data elements of the
@@ -892,16 +835,21 @@ record. Any number of record schema can coexist in the system.
  Although it may be wise to use only a single schema within
  one database, the system poses no such restrictions.
  
  Although it may be wise to use only a single schema within
  one database, the system poses no such restrictions.
  
+The record model described in this chapter applies to the fundamental,
+structured
+record type <tt>grs</tt> as introduced in
+section <ref id="record-types" name="Record Types">.
+
  Records pass through three different states during processing in the
  system.
  
  <itemize>
  Records pass through three different states during processing in the
  system.
  
  <itemize>
-<item>When records are first entered into the system, they are represented
+<item>When records are accessed by the system, they are represented
  in their local, or native format. This might be SGML or HTML files,
  News or Mail archives, MARC records. If the system doesn't already
  know how to read the type of data you need to store, you can set up an
  input filter by preparing conversion rules based on regular
  in their local, or native format. This might be SGML or HTML files,
  News or Mail archives, MARC records. If the system doesn't already
  know how to read the type of data you need to store, you can set up an
  input filter by preparing conversion rules based on regular
-expressions and a flexible scripting language (Tcl). The input filter
+expressions and possibly augmented by a flexible scripting language (Tcl). The input filter
  produces as output an internal representation:
  
  <item>When records are processed by the system, they are represented
  produces as output an internal representation:
  
  <item>When records are processed by the system, they are represented
@@ -926,6 +874,37 @@ turned into an internal structure that Zebra knows how to handle. This
  process takes place whenever the record is accessed - for indexing and
  retrieval.
  
  process takes place whenever the record is accessed - for indexing and
  retrieval.
  
+<p>
+The RecordType parameter in the <tt/zebra.cfg/ file, or the <tt/-t/
+option to the indexer tells Zebra how to process input records. Two
+basic types of processing are available - raw text and structured
+data. Raw text is just that, and it is selected by providing the
+argument <bf/text/ to Zebra. Structured records are all handled
+internally using the basic mechanisms described in the subsequent
+sections. Zebra can read structured records in many different formats.
+How this is done is governed by additional parameters after the
+&dquot;grs&dquot; keyboard, separated by &dquot;.&dquot; characters.
+
+Three basic subtypes to the <bf/grs/ type are currently available:
+
+<descrip>
+<tag>grs.sgml</tag>This is the canonical input format &mdash;
+described below. It is a simple SGML-like syntax.
+
+<tag>grs.regx.<it/filter/</tag>This enables a user-supplied input
+filter. The mechanisms of these filters are described below.
+
+<tag>grs.tcl.<it/filter/</tag>This enables a user-supplied input
+filter with Tcl rules (only availble if zebra is compiled with Tcl
+support).
+
+<tag>grs.marc.<it/abstract syntax/</tag>This allows Zebra to read
+records in the ISO2709 (MARC) encoding standard. In this case, the
+last paramemeter <it/abstract syntax/ names the .abs file (see below)
+which describes the specific MARC structure of the input record as
+well as the indexing rules.
+</descrip>
+
  <sect2>Canonical Input Format
  
  <p>
  <sect2>Canonical Input Format
  
  <p>
@@ -935,6 +914,9 @@ a single, canonical input format that gives access to the full
  spectrum of structure and flexibility in the system. In Zebra, this
  canonical format is an &dquot;SGML-like&dquot; syntax.
  
  spectrum of structure and flexibility in the system. In Zebra, this
  canonical format is an &dquot;SGML-like&dquot; syntax.
  
+To use the canonical format specify <tt>grs.sgml</tt> as the record
+type,
+
  Consider a record describing an information resource (such a record is
  sometimes known as a <it/locator record/). It might contain a field
  describing the distributor of the information resource, which might in
  Consider a record describing an information resource (such a record is
  sometimes known as a <it/locator record/). It might contain a field
  describing the distributor of the information resource, which might in
@@ -957,21 +939,21 @@ distributor, like this:
  </verb></tscreen>
  
  <it>NOTE: The indentation used above is used to illustrate how Zebra
  </verb></tscreen>
  
  <it>NOTE: The indentation used above is used to illustrate how Zebra
-interprets the expression. The indentation, in itself, has no
+interprets the markup. The indentation, in itself, has no
  significance to the parser for the canonical input format, which
  significance to the parser for the canonical input format, which
-ignores all whitespace.</it>
+discards superfluous whitespace.</it>
  
  The keywords surrounded by &lt;...&gt; are <it/tags/, while the
  sections of text in between are the <it/data elements/. A data element
  is characterized by its location in the tree that is made up by the
  nested elements. Each element is terminated by a closing tag -
  
  The keywords surrounded by &lt;...&gt; are <it/tags/, while the
  sections of text in between are the <it/data elements/. A data element
  is characterized by its location in the tree that is made up by the
  nested elements. Each element is terminated by a closing tag -
-beginning with &etago;, and containing the same symbolic tag-name as
-the corresponding opening tag. The general closing tag - &etago;&gt; -
+beginning with <tt/&etago;/, and containing the same symbolic tag-name as
+the corresponding opening tag. The general closing tag - <tt/&etago;&gt;/ -
  terminates the element started by the last opening tag. The
  structuring of elements is significant. The element <bf/Telephone/,
  for instance, may be indexed and presented to the client differently,
  depending on whether it appears inside the <bf/Distributor/ element,
  terminates the element started by the last opening tag. The
  structuring of elements is significant. The element <bf/Telephone/,
  for instance, may be indexed and presented to the client differently,
  depending on whether it appears inside the <bf/Distributor/ element,
-or some other data element.
+or some other, structured data element such a <bf/Supplier/ element.
  
  <sect3>Record Root
  
  
  <sect3>Record Root
  
@@ -984,7 +966,8 @@ name="Internal Representation">). The following is a GILS record that
  contains only a single element (strictly speaking, that makes it an
  illegal GILS record, since the GILS profile includes several mandatory
  elements - Zebra does not validate the contents of a record against
  contains only a single element (strictly speaking, that makes it an
  illegal GILS record, since the GILS profile includes several mandatory
  elements - Zebra does not validate the contents of a record against
-the Z39.50 profile, however):
+the Z39.50 profile, however - it merely attempts to match up elements
+of a local representation with the given schema):
  
  <tscreen><verb>
  <gils>
  
  <tscreen><verb>
  <gils>
@@ -998,8 +981,9 @@ the Z39.50 profile, however):
  Zebra allows you to provide individual data elements in a number of
  <it/variant forms/. Examples of variant forms are textual data
  elements which might appear in different languages, and images which
  Zebra allows you to provide individual data elements in a number of
  <it/variant forms/. Examples of variant forms are textual data
  elements which might appear in different languages, and images which
-may appear in different formats or layouts. The variant system is
-essentially a clean representation of the variant mechanism of
+may appear in different formats or layouts. The variant system in
+Zebra is
+essentially a representation of the variant mechanism of
  Z39.50-1995.
  
  The following is an example of a title element which occurs in two
  Z39.50-1995.
  
  The following is an example of a title element which occurs in two
@@ -1036,7 +1020,9 @@ Variant elements can be nested. The element
  </verb></tscreen>
  
  Associates two variant components to the variant list for the title
  </verb></tscreen>
  
  Associates two variant components to the variant list for the title
-element. Given the nesting rules described above, we could write
+element.
+
+Given the nesting rules described above, we could write
  
  <tscreen><verb>
  <title>
  
  <tscreen><verb>
  <title>
@@ -1050,19 +1036,25 @@ element. Given the nesting rules described above, we could write
  
  The title element above comes in two variants. Both have the IANA body
  type &dquot;text/plain&dquot;, but one is in English, and the other in
  
  The title element above comes in two variants. Both have the IANA body
  type &dquot;text/plain&dquot;, but one is in English, and the other in
-Danish.
+Danish. The client, using the element selection mechanism of Z39.50,
+can retrieve information about the available variant forms of data
+elements, or it can select specific variants based on the requirements
+of the end-user.
  
  <sect2>Input Filters
  
  <p>
  
  <sect2>Input Filters
  
  <p>
-In order to handle general, text-based input formats, Zebra allows the
-operator to specify filters which read individual records in their native format
+In order to handle general input formats, Zebra allows the
+operator to define filters which read individual records in their native format
  and produce an internal representation that the system can
  work with.
  
  Input filters are ASCII files, generally with the suffix <tt/.flt/.
  The system looks for the files in the directories given in the
  and produce an internal representation that the system can
  work with.
  
  Input filters are ASCII files, generally with the suffix <tt/.flt/.
  The system looks for the files in the directories given in the
-<bf/profilePath/ setting in the <tt/zebra.cfg/ file.
+<bf/profilePath/ setting in the <tt/zebra.cfg/ files. The record type
+for the filter is <tt>grs.regx.</tt><it>filter-filename</it>
+(fundamental type <tt>grs</tt>, file read type <tt>regx</tt>, argument
+<it>filter-filename</it>).
  
  Generally, an input filter consists of a sequence of rules, where each
  rule consists of a sequence of expressions, followed by an action. The
  
  Generally, an input filter consists of a sequence of rules, where each
  rule consists of a sequence of expressions, followed by an action. The
@@ -1111,10 +1103,11 @@ The available statements are:
  <tag>begin <it/type &lsqb;parameter ... &rsqb;/</tag>Begin a new
  data element. The type is one of the following:
  <descrip>
  <tag>begin <it/type &lsqb;parameter ... &rsqb;/</tag>Begin a new
  data element. The type is one of the following:
  <descrip>
-<tag/record/Begin a new record. The parameter should be the
+<tag/record/Begin a new record. The followingparameter should be the
  name of the schema that describes the structure of the record, eg.
  name of the schema that describes the structure of the record, eg.
-<tt/gils/ or <tt/wais/. The <tt/begin record/ call should come before
-any other call to <bf/begin/.
+<tt/gils/ or <tt/wais/ (see below). The <tt/begin record/ call should
+precede
+any other use of the <bf/begin/ statement.
  
  <tag/element/Begin a new tagged element. The parameter is the
  name of the tag. If the tag is not matched anywhere in the tagsets
  
  <tag/element/Begin a new tagged element. The parameter is the
  name of the tag. If the tag is not matched anywhere in the tagsets
@@ -1142,7 +1135,7 @@ any, is a type name, similar to the <bf/begin/ statement. For the
  </descrip>
  
  The following input filter reads a Usenet news file, producing a
  </descrip>
  
  The following input filter reads a Usenet news file, producing a
-record in the WAIS schema. Note that the body of the news posting is
+record in the WAIS schema. Note that the body of a news posting is
  separated from the list of headers by a blank line (or rather a
  sequence of two newline characters.
  
  separated from the list of headers by a blank line (or rather a
  sequence of two newline characters.
  
@@ -1168,10 +1161,9 @@ mechanisms for modifying the elements of a record. Tcl is a popular
  scripting environment, with several tutorials available both online
  and in hardcopy.
  
  scripting environment, with several tutorials available both online
  and in hardcopy.
  
-<it>NOTE: Tcl support is not currently available, but will be
-included with the next release.</it>
-
-<it>NOTE: Variant support is not currently available in the input filter, but will be included with the next release.</it>
+<it>NOTE: Variant support is not currently available in the input
+filter, but will be included with one of the next 
+releases.</it>
  
  <sect1>Internal Representation<label id="internal-representation">
  
  
  <sect1>Internal Representation<label id="internal-representation">
  
@@ -1202,13 +1194,13 @@ ROOT
  
  The root of the record will refer to the record schema that describes
  the structuring of this particular record. The schema defines the
  
  The root of the record will refer to the record schema that describes
  the structuring of this particular record. The schema defines the
-element tags (TITLE, FIRST-NAME, etc.) that occur in the record, as
+element tags (TITLE, FIRST-NAME, etc.) that may occur in the record, as
  well as the structuring (SURNAME should appear below AUTHOR, etc.). In
  addition, the schema establishes element set names that are used by
  the client to request a subset of the elements of a given record. The
  schema may also establish rules for converting the record to a
  different schema, by stating, for each element, a mapping to a
  well as the structuring (SURNAME should appear below AUTHOR, etc.). In
  addition, the schema establishes element set names that are used by
  the client to request a subset of the elements of a given record. The
  schema may also establish rules for converting the record to a
  different schema, by stating, for each element, a mapping to a
-different tagging.
+different tag path.
  
  <sect2>Tagged Elements
  
  
  <sect2>Tagged Elements
  
@@ -1228,10 +1220,12 @@ reached from the root of the record).
  <sect2>Variants
  
  <p>
  <sect2>Variants
  
  <p>
-The children of a tag node may be either more tag nodes, a data node,
-or a tree of variant nodes. The children of variant nodes are either
-more variant nodes or data nodes. Each leaf node, which is normally a
-data node, corresponds to a <it/variant form/ or the tagged element
+The children of a tag node may be either more tag nodes, a data node
+(possibly accompanied by tag nodes),
+or a tree of variant nodes. The children of  variant nodes are either
+more variant nodes or a data node (possibly accompanied by more
+variant nodes). Each leaf node, which is normally a
+data node, corresponds to a <it/variant form/ of the tagged element
  identified by the tag which parents the variant tree. The following
  title element occurs in two different languages:
  
  identified by the tag which parents the variant tree. The following
  title element occurs in two different languages:
  
@@ -1253,22 +1247,31 @@ type, value, corresponding to the variant mechanism of Z39.50.
  Data nodes have no children (they are always leaf nodes in the record
  tree).
  
  Data nodes have no children (they are always leaf nodes in the record
  tree).
  
-<it>NOTE: Add more stuff here about types of nodes - numerical,
+<it>NOTE: Documentation needs extension here about types of nodes - numerical,
  textual, etc., plus the various types of inclusion notes.</it>
  
  textual, etc., plus the various types of inclusion notes.</it>
  
-<sect1>Configuring Your Data Model
+<sect1>Configuring Your Data Model<label id="data-model">
  
  <p>
  The following sections describe the configuration files that govern
  
  <p>
  The following sections describe the configuration files that govern
-the internal management of records. The system searches for the files
+the internal management of data records. The system searches for the files
  in the directories specified by the <bf/profilePath/ setting in the
  <tt/zebra.cfg/ file.
  
  in the directories specified by the <bf/profilePath/ setting in the
  <tt/zebra.cfg/ file.
  
+<sect2>About Object Identifers
+<p>
+When Object Identifiers (or OID's) need to be specified in the following
+a named OID reference or a raw OID reference may be used. For the named
+OID's refer to the source file <tt>util/oid.c</tt> from YAZ. The raw
+canonical OID's are specified in dot-notation (for example
+1.2.840.10003.3.1000.81.1).
+
  <sect2>The Abstract Syntax
  
  <p>
  <sect2>The Abstract Syntax
  
  <p>
-The abstract syntax definition (ARS) is the focal point of the
-record schema description. For a given schema, it may state any
+The abstract syntax definition (also known as an Abstract Record
+Structure, or ARS) is the focal point of the
+record schema description. For a given schema, the ABS file may state any
  or all of the following:
  
  <itemize>
  or all of the following:
  
  <itemize>
@@ -1317,14 +1320,15 @@ are used by the retrieval module.
  
  The number of different file types may appear daunting at first, but
  each type corresponds fairly clearly to a single aspect of the Z39.50
  
  The number of different file types may appear daunting at first, but
  each type corresponds fairly clearly to a single aspect of the Z39.50
-retrieval facilities. Further, the average database administrator
+retrieval facilities. Further, the average database administrator,
  who is simply reusing an existing profile for which tables already
  exist, shouldn't have to worry too much about the contents of these tables.
  
  Generally, the files are simple ASCII files, which can be maintained
  using any text editor. Blank lines, and lines beginning with a (&num;) are
  who is simply reusing an existing profile for which tables already
  exist, shouldn't have to worry too much about the contents of these tables.
  
  Generally, the files are simple ASCII files, which can be maintained
  using any text editor. Blank lines, and lines beginning with a (&num;) are
-ignored. Any characters followed by a (&num;) are also ignored. All other
-lines contain <it/directives/, which establish some setting or value
+ignored. Any characters on a line followed by a (&num;) are also ignored.
+All other
+lines contain <it/directives/, which provide some setting or value
  to the system. Generally, settings are characterized by a single
  keyword, identifying the setting, followed by a number of parameters.
  Some settings are repeatable (r), while others may occur only once in a
  to the system. Generally, settings are characterized by a single
  keyword, identifying the setting, followed by a number of parameters.
  Some settings are repeatable (r), while others may occur only once in a
@@ -1345,7 +1349,7 @@ profile that governs the layout of the record. If the first tag of the
  record is, say, <tt>&lt;gils&gt;</tt>, the system will look for the profile
  definition in the file <tt/gils.abs/. Profile definitions are cached,
  so they only have to be read once during the lifespan of the current
  record is, say, <tt>&lt;gils&gt;</tt>, the system will look for the profile
  definition in the file <tt/gils.abs/. Profile definitions are cached,
  so they only have to be read once during the lifespan of the current
-process.
+process. 
  
  When writing your own input filters, the <bf/record-begin/ command
  introduces the profile, and should always be called first thing when
  
  When writing your own input filters, the <bf/record-begin/ command
  introduces the profile, and should always be called first thing when
@@ -1357,15 +1361,16 @@ The file may contain the following directives:
  <tag>name <it/symbolic-name/</tag> (m) This provides a shorthand name or
  description for the profile. Mostly useful for diagnostic purposes.
  
  <tag>name <it/symbolic-name/</tag> (m) This provides a shorthand name or
  description for the profile. Mostly useful for diagnostic purposes.
  
-<tag>reference <it/OID-name/</tag> (m) The reference name of the OID for
-the profile. The reference names can be found in the <bf/util/
-module of <bf/YAZ/.
+<tag>reference <it/OID-name/</tag> (m) The OID for
+the profile (name or dotted-numerical list).
  
  <tag>attset <it/filename/</tag> (m) The attribute set that is used for
  indexing and searching records belonging to this profile.
  
  
  <tag>attset <it/filename/</tag> (m) The attribute set that is used for
  indexing and searching records belonging to this profile.
  
-<tag>tagset <it/filename/</tag> (o) The tag set (if any) that describe
-that fields of the records.
+<tag>tagset <it/filename/ &lsqb;<it/type/&rsqb;</tag> (o) The tag
+set (if any) that describe that fields of the records. The type, which
+is optional, specifies the tag type. If not given, the type-specifier
+in the Tag Set files is used.
  
  <tag>varset <it/filename/</tag> (o) The variant set used in the profile.
  
  
  <tag>varset <it/filename/</tag> (o) The variant set used in the profile.
  
@@ -1382,24 +1387,36 @@ given element set name with an element selection file. If an (@) is
  given in place of the filename, this corresponds to a null mapping for
  the given element set name.
  
  given in place of the filename, this corresponds to a null mapping for
  the given element set name.
  
-<tag>elm <it/path name attribute/</tag> (o,r) Adds an element
+<tag>any <it/tags/</tag> (o) This directive specifies a list of
+attributes which should be appended to the attribute list given for each
+element. The effect is to make every single element in the abstract
+syntax searchable by way of the given attributes. This directive
+provides an efficient way of supporting free-text searching across all
+elements. However, it does increase the size of the index
+significantly. The attributes can be qualified with a structure, as in
+the <bf/elm/ directive below.
+
+<tag>elm <it/path name attributes/</tag> (o,r) Adds an element
  to the abstract record syntax of the schema. The <it/path/ follows the
  syntax which is suggested by the Z39.50 document - that is, a sequence
  of tags separated by slashes (/). Each tag is given as a
  comma-separated pair of tag type and -value surrounded by parenthesis.
  to the abstract record syntax of the schema. The <it/path/ follows the
  syntax which is suggested by the Z39.50 document - that is, a sequence
  of tags separated by slashes (/). Each tag is given as a
  comma-separated pair of tag type and -value surrounded by parenthesis.
-The <it/name/ is the name of the element, and the <it/attribute/
-specifies what attribute to use when indexing the element. A ! in
+The <it/name/ is the name of the element, and the <it/attributes/
+specifies which attributes to use when indexing the element in a
+comma-separated list. A &excl; in
  place of the attribute name is equivalent to specifying an attribute
  name identical to the element name. A - in place of the attribute name
  place of the attribute name is equivalent to specifying an attribute
  name identical to the element name. A - in place of the attribute name
-specifies that no indexing is to take place for the given element.
+specifies that no indexing is to take place for the given element. The
+attributes can be qualified with <it/field types/ to specify which
+character set should govern the indexing procedure for that field. The
+same data element may be indexed into several different fields, using
+different character set definitions. See the section
+<ref id="field structure and character sets"
+name="Field Structure and Character Sets">.
+The default field type is &dquot;w&dquot; for
+<it/word/.
  </descrip>
  
  </descrip>
  
-<it>
-NOTE: The mechanism for controlling indexing is not adequate for
-complex databases, and will probably be moved into a separate
-configuration table eventually.
-</it>
-
  The following is an excerpt from the abstract syntax file for the GILS
  profile.
  
  The following is an excerpt from the abstract syntax file for the GILS
  profile.
  
@@ -1423,7 +1440,7 @@ elm (1,10)              rank                        -
  elm (1,12)              url                         -
  elm (1,14)              localControlNumber     Local-number
  elm (1,16)              dateOfLastModification Date/time-last-modified
  elm (1,12)              url                         -
  elm (1,14)              localControlNumber     Local-number
  elm (1,16)              dateOfLastModification Date/time-last-modified
-elm (2,1)               Title                       !
+elm (2,1)               Title                       w:!,p:!
  elm (4,1)               controlIdentifier      Identifier-standard
  elm (2,6)               abstract               Abstract
  elm (4,51)              purpose                     !
  elm (4,1)               controlIdentifier      Identifier-standard
  elm (2,6)               abstract               Abstract
  elm (4,51)              purpose                     !
@@ -1438,7 +1455,7 @@ elm (4,70)/(4,90)/(4,2) distributorStreetAddress    !
  elm (4,70)/(4,90)/(4,3) distributorCity             !
  </verb></tscreen>
  
  elm (4,70)/(4,90)/(4,3) distributorCity             !
  </verb></tscreen>
  
-<sect2>The Attribute Set (.att) Files
+<sect2>The Attribute Set (.att) Files<label id="attset-files">
  
  <p>
  This file type describes the <bf/Use/ elements of an attribute set.
  
  <p>
  This file type describes the <bf/Use/ elements of an attribute set.
@@ -1450,12 +1467,7 @@ It contains the following directives.
  description for the attribute set. Mostly useful for diagnostic purposes.
  
  <tag>reference <it/OID-name/</tag> (m) The reference name of the OID for
  description for the attribute set. Mostly useful for diagnostic purposes.
  
  <tag>reference <it/OID-name/</tag> (m) The reference name of the OID for
-the attribute set. The reference names can be found in the <bf/util/
-module of <bf/YAZ/.
-
-<tag>ordinal <it/integer/</tag> (m) This value will be used to represent the
-attribute set in the index. Care should be taken that each attribute
-set has a unique ordinal value.
+the attribute set.
  
  <tag>include <it/filename/</tag> (o,r) This directive is used to
  include another attribute set as a part of the current one. This is
  
  <tag>include <it/filename/</tag> (o,r) This directive is used to
  include another attribute set as a part of the current one. This is
@@ -1480,7 +1492,6 @@ the file describing the <it/bib-1/ attribute set is referenced.
  name gils
  reference GILS-attset
  include bib1.att
  name gils
  reference GILS-attset
  include bib1.att
-ordinal 2
  
  att 2001               distributorName
  att 2002               indexTermsControlled
  
  att 2001               distributorName
  att 2002               indexTermsControlled
@@ -1502,12 +1513,12 @@ contain the following directives.
  description for the tag set. Mostly useful for diagnostic purposes.
  
  <tag>reference <it/OID-name/</tag> (o) The reference name of the OID for
  description for the tag set. Mostly useful for diagnostic purposes.
  
  <tag>reference <it/OID-name/</tag> (o) The reference name of the OID for
-the tag set. The reference names can be found in the <bf/util/
-module of <bf/YAZ/. The directive is optional, since not all tag sets
-are registered outside of their schema.
+the tag set. The directive is optional, since not all tag sets are
+registered outside of their schema.
  
  
-<tag>type <it/integer/</tag> (m) The type number of the tag within the schema
-profile.
+<tag>type <it/integer/</tag> (m) The type number of the tagset within the schema
+profile (note: this specification really should belong to the .abs
+file. This will be fixed in a future release).
  
  <tag>include <it/filename/</tag> (o,r) This directive is used
  to include the definitions of other tag sets into the current one.
  
  <tag>include <it/filename/</tag> (o,r) This directive is used
  to include the definitions of other tag sets into the current one.
@@ -1567,8 +1578,7 @@ These are the directives allowed in the file.
  description for the variant set. Mostly useful for diagnostic purposes.
  
  <tag>reference <it/OID-name/</tag> (o) The reference name of the OID for
  description for the variant set. Mostly useful for diagnostic purposes.
  
  <tag>reference <it/OID-name/</tag> (o) The reference name of the OID for
-the variant set, if one is required. The reference names can be found
-in the <bf/util/ module of <bf/YAZ/.
+the variant set, if one is required.
  
  <tag>class <it/integer class-name/</tag> (m,r) Introduces a new
  class to the variant set.
  
  <tag>class <it/integer class-name/</tag> (m,r) Introduces a new
  class to the variant set.
@@ -1711,8 +1721,7 @@ of the table. Useful mostly for diagnostic purposes.
  
  <tag>targetRef <it/OID-name/</tag> (m) An OID name for the target schema.
  This is used, for instance, by a server receiving a request to present
  
  <tag>targetRef <it/OID-name/</tag> (m) An OID name for the target schema.
  This is used, for instance, by a server receiving a request to present
-a record in a different schema from the native one. The name, again,
-is found in the <bf/oid/ module of <bf/YAZ/.
+a record in a different schema from the native one.
  
  <tag>map <it/element-name target-path/</tag> (o,r) Adds
  an element mapping rule to the table.
  
  <tag>map <it/element-name target-path/</tag> (o,r) Adds
  an element mapping rule to the table.
@@ -1729,6 +1738,115 @@ header of the record.
  re-evaluating and most likely changing the way that MARC records are
  handled by the system.</it>
  
  re-evaluating and most likely changing the way that MARC records are
  handled by the system.</it>
  
+<sect2>Field Structure and Character Sets
+<label id="field structure and character sets">
+
+<p>
+In order to provide a flexible approach to national character set
+handling, Zebra allows the administrator to configure the set up the
+system to handle any 8-bit character set &mdash; including sets that
+require multi-octet diacritics or other multi-octet characters. The
+definition of a character set includes a specification of the
+permissible values, their sort order (this affects the display in the
+SCAN function), and relationships between upper- and lowercase
+characters. Finally, the definition includes the specification of
+space characters for the set.
+
+The operator can define different character sets for different fields,
+typical examples being standard text fields, numerical fields, and
+special-purpose fields such as WWW-style linkages (URx).
+
+The field types, and hence character sets, are associated with data
+elements by the .abs files (see above). The file <tt/default.idx/
+provides the association between field type codes (as used in the .abs
+files) and the character map files (with the .chr suffix). The format
+of the .idx file is as follows
+
+<descrip>
+<tag>index <it/field type code/</tag>This directive introduces a new
+search index code. The argument is a one-character code to be used in the
+.abs files to select this particular index type. An index, roughly,
+corresponds to a particular structure attribute during search. Refer
+to section <ref id="search" name="Search">.
+
+<tag>sort <it/field code type/</tag>This directive introduces a 
+sort index. The argument is a one-character code to be used in the
+.abs fie to select this particular index type. The corresponding
+use attribute must be used in the sort request to refer to this
+particular sort index. The corresponding character map (see below)
+is used in the sort process.
+
+<tag>completeness <it/boolean/</tag>This directive enables or disables
+complete field indexing. The value of the <it/boolean/ should be 0
+(disable) or 1. If completeness is enabled, the index entry will
+contain the complete contents of the field (up to a limit), with words
+(non-space characters) separated by single space characters
+(normalized to &dquot; &dquot; on display). When completeness is
+disabled, each word is indexed as a separate entry. Complete subfield
+indexing is most useful for fields which are typically browsed (eg.
+titles, authors, or subjects), or instances where a match on a
+complete subfield is essential (eg. exact title searching). For fields
+where completeness is disabled, the search engine will interpret a
+search containing space characters as a word proximity search.
+
+<tag>charmap <it/filename/</tag> This is the filename of the character
+map to be used for this index for field type.
+</descrip>
+
+The contents of the character map files are structured as follows:
+
+<descrip>
+<tag>lowercase <it/value-set/</tag>This directive introduces the basic
+value set of the field type. The format is an ordered list (without
+spaces) of the characters which may occur in &dquot;words&dquot; of
+the given type. The order of the entries in the list determines the
+sort order of the index. In addition to single characters, the
+following combinations are legal:
+
+<itemize>
+<item>Backslashes may be used to introduce three-digit octal, or
+two-digit hex representations of single characters (preceded by <tt/x/).
+In addition, the combinations
+\\, \\r, \\n, \\t, \\s (space &mdash; remember that real space-characters
+may ot occur in the value definition), and \\ are recognised,
+with their usual interpretation.
+
+<item>Curly braces {} may be used to enclose ranges of single
+characters (possibly using the escape convention described in the
+preceding point), eg. {a-z} to entroduce the standard range of ASCII
+characters. Note that the interpretation of such a range depends on
+the concrete representation in your local, physical character set.
+
+<item>Paranthesises () may be used to enclose multi-byte characters -
+eg. diacritics or special national combinations (eg. Spanish
+&dquot;ll&dquot;). When found in the input stream (or a search term),
+these characters are viewed and sorted as a single character, with a
+sorting value depending on the position of the group in the value
+statement.
+</itemize>
+
+<tag>uppercase <it/value-set/</tag>This directive introduces the
+upper-case equivalencis to the value set (if any). The number and
+order of the entries in the list should be the same as in the
+<tt/lowercase/ directive.
+
+<tag>space <it/value-set/</tag>This directive introduces the character
+which separate words in the input stream. Depending on the
+completeness mode of the field in question, these characters either
+terminate an index entry, or delimit individual &dquot;words&dquot; in
+the input stream. The order of the elements is not significant &mdash;
+otherwise the representation is the same as for the <tt/upercase/ and
+<tt/lowercase/ directives.
+
+<tag>map <it/value-set/ <it/target/</tag>This directive introduces a
+mapping between each of the members of the value-set on the left to
+the character on the right. The character on the right must occur in
+the value set (the <tt/lowercase/ directive) of the character set, but
+it may be a paranthesis-enclosed multi-octet character. This directive
+may be used to map diacritics to their base characters, or to map
+HTML-style character-representations to their natural form, etc.
+</descrip>
+
  <sect1>Exchange Formats
  
  <p>
  <sect1>Exchange Formats
  
  <p>
@@ -1743,7 +1861,9 @@ applied variant and supported variant lists as required, if a record
  contains variant information.
  
  <item>SUTRS. Again, the mapping is fairly straighforward. Indentation
  contains variant information.
  
  <item>SUTRS. Again, the mapping is fairly straighforward. Indentation
-is used to show the hierarchical structure of the record.
+is used to show the hierarchical structure of the record. All
+&dquot;GRS&dquot; type records support both the GRS-1 and SUTRS
+representations.
  
  <item>ISO2709-based formats (USMARC, etc.). Only records with a
  two-level structure (corresponding to fields and subfields) can be
  
  <item>ISO2709-based formats (USMARC, etc.). Only records with a
  two-level structure (corresponding to fields and subfields) can be
@@ -1760,82 +1880,113 @@ approach.
  <item>Explain. This representation is only available for records
  belonging to the Explain schema.
  
  <item>Explain. This representation is only available for records
  belonging to the Explain schema.
  
+<item>Summary.  This ASN-1 based structure is only available for records
+belonging to the Summary schema - or schema which provide a mapping
+to this schema (see the description of the schema mapping facility
+above).
+
+<item>SOIF. Support for this syntax is experimental, and is currently
+keyed to a private Index Data OID (1.2.840.10003.5.1000.81.2). All
+abstract syntaxes can be mapped to the SOIF format, although nested
+elements are represented by concatenation of the tag names at each
+level.
+
+<item>XML. The use of XML as a transfer syntax in Z39.50 is not yet widely established
+so the use of it here must be characterised as somewhat experimental. The
+tag-names used are taken from the tag-set in use, except for local string tags
+where the tag itself is passed through unchanged.
+
  </itemize>
  
  <sect>License
  
  <p>
  </itemize>
  
  <sect>License
  
  <p>
-Copyright &copy; 1995, Index Data.
+Zebra
+Copyright (c) 1995-2000 Index Data ApS.
  
  All rights reserved.
  
  Use and redistribution in source or binary form, with or without
  modification, of any or all of this software and documentation is
  
  All rights reserved.
  
  Use and redistribution in source or binary form, with or without
  modification, of any or all of this software and documentation is
-permitted, provided that the following conditions are met:
-
-1. This copyright and permission notice appear with all copies of the
-software and its documentation. Notices of copyright or attribution
-which appear at the beginning of any file must remain unchanged.
-
-2. The names of Index Data or the individual authors may not be used to
-endorse or promote products derived from this software without specific
-prior written permission.
-
-3. Source code or binary versions of this software and its
-documentation may be used freely in not-for-profit applications. For
-profit applications - such as providing for-pay database services,
-marketing a product based in whole or in part on this software or its
-documentation, or generally distributing this software or its
-documentation under a different license - requires a commercial
-license from Index Data. The software may be installed and used for
-evaluation purposes in conjunction with a commercial application for a
-trial period of no more than 60 days.
-
-THIS SOFTWARE IS PROVIDED "AS IS" AND WITHOUT WARRANTY OF ANY KIND,
-EXPRESS, IMPLIED, OR OTHERWISE, INCLUDING WITHOUT LIMITATION, ANY
-WARRANTY OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
-IN NO EVENT SHALL INDEX DATA BE LIABLE FOR ANY SPECIAL, INCIDENTAL,
-INDIRECT OR CONSEQUENTIAL DAMAGES OF ANY KIND, OR ANY DAMAGES
-WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER OR
-NOT ADVISED OF THE POSSIBILITY OF DAMAGE, AND ON ANY THEORY OF
-LIABILITY, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE
-OF THIS SOFTWARE.
+permitted, provided that the following Conditions 1 to 6 set out below
+are met.
+
+1. Unless prior specific written permission is obtained this copyright
+and permission notice appear with all copies of the software and its
+documentation. Notices of copyright or attribution which appear at the
+beginning of any file must remain unchanged.
+ 
+2. The names of Index Data or the individual authors may not be used
+to endorse or promote products derived from this software without
+specific prior written permission.
+ 
+3. Source code or binary versions of this software and its documentation
+may be used freely in not for profit applications limited to databases
+of 100,000 records maximum. Other applications - such as publishing over
+100,000 records, providing for-pay services, distributing a product based
+in whole or in part on this software or its documentation, or generally 
+distributing this software or its documentation under a different license 
+require a commercial license from Index Data. 
+
+4. The software may be installed and used for evaluation purposes in
+conjunction with such commercially licensed applications for a trial
+period no longer than 60 days.
+ 
+5. Unless a prior specific written agreement is obtained THIS SOFTWARE
+IS PROVIDED "AS IS" AND WITHOUT WARRANTY OF ANY KIND, EXPRESS, IMPLIED,
+OR OTHERWISE, INCLUDING WITHOUT LIMITATION, ANY WARRANTY OF
+MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. IN NO EVENT SHALL
+INDEX DATA BE LIABLE FOR ANY SPECIAL, INCIDENTAL, INDIRECT OR
+CONSEQUENTIAL DAMAGES OF ANY KIND, OR ANY DAMAGES WHATSOEVER RESULTING
+FROM LOSS OF USE, DATA OR PROFITS, WHETHER OR NOT ADVISED OF THE
+POSSIBILITY OF DAMAGE, AND ON ANY THEORY OF LIABILITY, ARISING OUT OF
+OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+
+6. Commercial licenses and support agreements for Zebra and related
+Index Data products such as Z'bol (c) - and written agreements
+relating to these Conditions may be obtained only from Index Data
+or its appointed agents as follows: 
+
+Index Data: www.indexdata.dk
+Fretwell-Downing Informatics: www.fdgroup.co.uk
+Fretwell-Downing Informatics USA: www.fdi.com
  
  <sect>About Index Data and the Zebra Server
  
  <p>
  Index Data is a consulting and software-development enterprise that
  
  <sect>About Index Data and the Zebra Server
  
  <p>
  Index Data is a consulting and software-development enterprise that
-specialises in library and information management systems. Our
+specialises in information management and retrieval applications. Our
  interests and expertise span a broad range of related fields, and one
  of our primary, long-term objectives is the development of a powerful
  information management
  interests and expertise span a broad range of related fields, and one
  of our primary, long-term objectives is the development of a powerful
  information management
-system with open network interfaces and hypermedia capabilities.
+system with open network interfaces and hypermedia capabilities. Zebra is an
+important component in this strategy.
  
  We make this software available free of charge for not-for-profit
  purposes, as a service to the networking community, and to further
  the development and use of quality software for open network
  
  We make this software available free of charge for not-for-profit
  purposes, as a service to the networking community, and to further
  the development and use of quality software for open network
-communication.
+communication. We encourage your comments and questions if you have ideas, things
+you would like to  see in future versions, or things you would like to
+contribute.
  
  If you like this software, and would like to use all or part of it in
  a commercial product, or to provide a commercial database service,
  
  If you like this software, and would like to use all or part of it in
  a commercial product, or to provide a commercial database service,
-please contact us to discuss the details. We'll be happy to answer
-questions about the software, and about our services in general. If
-you have specific requirements to the software, we'll be glad to offer
-our advice - and if you need to adapt the software to a special
-purpose, our consulting services and expert knowledge of the software
-is available to you at favorable rates.
-
-<tscreen>
-Index Data&nl
-Ryesgade 3&nl
-DK-2200 K&oslash;benhavn N&nl
-</tscreen>
+please contact us. The Z'mbol Information System represents the commercial
+variant of Zebra. It includes full support; additional functionality and
+performance-boosting features, and it has what we think is a very exciting
+development path.
+
+<tscreen><verb>
+Index Data
+Ryesgade 3
+DK-2200 Copenhagen N
+</verb></tscreen>
  
  <p>
  <tscreen><verb>
  Phone: +45 3536 3672
  Fax  : +45 3536 0449
  
  <p>
  <tscreen><verb>
  Phone: +45 3536 3672
  Fax  : +45 3536 0449
-Email: info@index.ping.dk
+Email: info@indexdata.dk
  </verb></tscreen>
  
  The <it>Random House College Dictionary</it>, 1975 edition
  </verb></tscreen>
  
  The <it>Random House College Dictionary</it>, 1975 edition