Fixed minor memory leak

[idzebra-moved-to-github.git] / doc / field-structure.xml
diff --git a/doc/field-structure.xml b/doc/field-structure.xml

index 3a0a5f2..4079205 100644 (file)
--- a/doc/field-structure.xml
+++ b/doc/field-structure.xml
@@ -1,11 +1,11 @@
   <chapter id="fields-and-charsets">
-  <!-- $Id: field-structure.xml,v 1.8 2006-11-28 13:05:57 marc Exp $ -->
+  <!-- $Id: field-structure.xml,v 1.12 2007-02-02 09:58:39 marc Exp $ -->
    <title>Field Structure and Character Sets
    </title>
    
    <para>
     In order to provide a flexible approach to national character set
-   handling, Zebra allows the administrator to configure the set up the
+   handling, &zebra; allows the administrator to configure the set up the
     system to handle any 8-bit character set &mdash; including sets that
     require multi-octet diacritics or other multi-octet characters. The
     definition of a character set includes a specification of the
@@ -108,6 +108,40 @@
        </listitem></varlistentry>
      </variablelist>
     </para>
+   <para>
+    Following are three excerpts of the standard
+    <filename>tab/default.idx</filename> configuration file. Notice
+    that the <literal>index</literal> and <literal>sort</literal>
+    are grouping directives, which bind all other following directives
+    to them:
+    <screen>
+     # Traditional word index
+     # Used if completenss is 'incomplete field' (@attr 6=1) and
+     # structure is word/phrase/word-list/free-form-text/document-text
+     index w
+     completeness 0
+     position 1
+     alwaysmatches 1
+     firstinfield 1
+     charmap string.chr
+
+     ...
+
+     # Null map index (no mapping at all)
+     # Used if structure=key (@attr 4=3)
+     index 0
+     completeness 0
+     position 1
+     charmap @
+
+     ...
+
+     # Sort register
+     sort s
+     completeness 1
+     charmap string.chr
+    </screen>
+   </para>
    </section>
  
    <section id="character-map-files">
@@ -115,13 +149,13 @@
     <para>
      The character map files are used to define the word tokenization
      and character normalization performed before inserting text into
-    the inverse indexes. Zebra ships with the predefined character map
+    the inverse indexes. &zebra; ships with the predefined character map
      files <filename>tab/*.chr</filename>. Users are allowed to add
      and/or modify maps according to their needs.  
     </para>
  
-   <table id="querymodel-attribute-sets-table" frame="top">
-     <title>Character maps predefined in Zebra</title>
+   <table id="character-map-table" frame="top">
+     <title>Character maps predefined in &zebra;</title>
        <tgroup cols="3">
         <thead>
          <row>
@@ -175,6 +209,29 @@
     <para>
      The contents of the character map files are structured as follows:
      <variablelist>
+     <varlistentry>
+      <term>encoding <replaceable>encoding-name</replaceable></term>
+      <listitem>
+       <para>
+       This directive must be at the very beginning of the file, and it
+        specifies the character encoding used in the entire file. If
+        omitted, the encoding <literal>ISO-8859-1</literal> is assumed.
+       </para>
+       <para>
+        For example, one of the test files found at  
+          <literal>test/rusmarc/tab/string.chr</literal> contains the following
+        encoding directive:
+        <screen>
+         encoding koi8-r
+        </screen>
+          and the test file
+          <literal>test/charmap/string.utf8.chr</literal> is encoded
+          in UTF-8:
+        <screen>
+         encoding utf-8
+        </screen>
+       </para>
+      </listitem></varlistentry>
  
       <varlistentry>
        <term>lowercase <replaceable>value-set</replaceable></term>
@@ -332,7 +389,7 @@
     <para>
      In addition to specifying sort orders, space (blank) handling,
      and upper/lowercase folding, you can also use the character map
-    files to make Zebra ignore leading articles in sorting records,
+    files to make &zebra; ignore leading articles in sorting records,
      or when doing complete field searching.
     </para>
     <para>