<chapter id="examples">
- <!-- $Id: examples.xml,v 1.6 2002-09-20 09:58:04 mike Exp $ -->
+ <!-- $Id: examples.xml,v 1.7 2002-10-08 08:09:43 mike Exp $ -->
<title>Example Configurations</title>
<sect1>
<listitem>
<para>
- Where to find the default indexing rules (### default.idx)
+ Where to find subsidiary configuration files, including
+ <literal>default.idx</literal>
+ which specifies the default indexing rules.
</para>
</listitem>
<listitem>
<para>
- ### Something to do with explain.abs?!
+ What attribute sets to recognise in searches.
</para>
</listitem>
<listitem>
<para>
- ### Where to find other configuration files, e.g. searches using
- BIB-1 attributes require a bib1.att configuration file (even if
- the access point is actually an XPath expression). These are
- searched for in the working directory unless otherwise
- specified.
+ Policy details such as what record type to expect, what
+ low-level indexing algorithm to use, how to identify potential
+ duplicate records, etc.
</para>
</listitem>
</itemizedlist>
</para>
+ <para>
+ Now let's see what goes in the <literal>zebra.cfg</literal> file
+ for some example configurations.
+ </para>
</sect1>
<sect1 id="example1">
- <title>Example 1: Minimal Configuration</title>
+ <title>Example 1: XML Indexing And Searching</title>
<para>
This example shows how Zebra can be used with absolutely minimal
- configuration to index a body of XML documents, and search them
- using XPath expressions to specify access points.
+ configuration to index a body of
+ <ulink url="http://www.w3.org/xml/###">XML</ulink>
+ documents, and search them using
+ <ulink url="http://www.w3.org/xpath/###">XPath</ulink>
+ expressions to specify access points.
</para>
<para>
- Go to the <literal>zebra/examples/dinosauricon</literal> directory.
+ Go to the <literal>examples/dinosauricon</literal> subdirectory
+ of the distribution archive.
There you will find a <literal>records</literal> subdirectory,
which contains some raw XML data to be added to the database: in
- this case, two files, <literal>genera.xml</literal> and
- <literal>taxa.xml</literal>, which contain information about all
- the known dinosaur genera as of August 2002.
+ this case, as single file, <literal>genera.xml</literal>,
+ which contain information about all the known dinosaur genera as of
+ August 2002.
</para>
<para>
Now we need to create the Zebra database, which we do with the
- Zebra indexer, <literal>zebraidx</literal>. This program's
- behaviour is driven by a configuration life, generally called
- <literal>zebra.cfg</literal>, although this can be changed with the
- <literal>-c</literal> option. For our purposes, we don't need any
- special behaviour - we can use the defaults - so an empty
- configuration will do just fine. We can either create an empty
- <literal>zebra.cfg</literal> or specify the name of an existing
- empty file using, for example, <literal>-c /dev/null</literal>.
- </para>
- <para>
- In this case, we'll use an empty <literal>zebra.cfg</literal> so
- we can add more configuration to it later.
+ Zebra indexer, <literal>zebraidx</literal>, which is
+ driven by the <literal>zebra.cfg</literal> configuration file.
+ For our purposes, we don't need any
+ special behaviour - we can use the defaults - so we start with a
+ minimal file that just tells <literal>zebraidx</literal> where to
+ find the default indexing rules, and how to parse the records:
+ <screen>
+ profilePath: .:../../tab:../../../yaz/tab
+ recordType: grs.sgml
+ </screen>
</para>
<para>
That's all you need for a minimal Zebra configuration. Now you can
roll the XML records into the database and build the indexes:
<screen>
- zebraidx -t grs.sgml update records
+ zebraidx update records
</screen>
- (### What does "grs.sgml" actually mean?)
</para>
<para>
Now start the server. Like the indexer, its behaviour is
- controlled by a configuration file, generally
- <literal>zebra.cfg</literal>; and like the indexer, it works just
- fine with an empty configuration.
+ controlled by the
+ <literal>zebra.cfg</literal> file; and like the indexer, it works
+ just fine with this minimal configuration.
<screen>
zebrasrv
</screen>
By default, the server listens on IP port number 9999, although
- this can easily be changed.
+ this can easily be changed - see
+ <xref linkend="zebrasrv"/>.
</para>
<para>
Now you can use the Z39.50 client program of your choice to execute
XPath-based boolean queries and fetch the XML records that satisfy
them:
<screen>
- Z> open tcp:@:9999
- Connecting...Ok.
- Z> find @attr 1=/GENUS/MEANING @or vertebra jaw
- Number of hits: 1
- Z> format xml
- Z> show 1
- Z> show 1
- <GENUS name="Hudiesaurus" type="with" xmlns:idzebra="http://www.indexdata.dk/zebra/">
- <MEANING>
- butterfly <LOW>vertebra</LOW> lizard
- </MEANING>
- <LENGTH value="30"></LENGTH>
- <PLACE name="China"></PLACE>
- <REMAINS content="4 teeth, forelimb, first dorsal vertebra"></REMAINS>
- <SPECIES name="sinojapanorum" status="nudum">
- <AUTHOR name="Dong" year="1997"></AUTHOR>
- <MEANING>
- Chinese-Japanese
- </MEANING>
- </SPECIES>
- <idzebra:size>359</idzebra:size><idzebra:localnumber>447</idzebra:localnumber><idzebra:filename>records/genera.xml</idzebra:filename></GENUS>
+ $ yaz-client tcp:@:9999
+ Connecting...Ok.
+ Z> find @attr 1=/GENUS/MEANING @and lizard earthquakes
+ Number of hits: 1
+ Z> format xml
+ Z> show 1
+ <GENUS name="Sauroposeidon" type="with">
+ <MEANING>lizard Poseidon <LOW>(Greek god of, among other things, earthquakes)</LOW></MEANING>
+ <SPECIES name="proteles">
+ <AUTHOR type="vide" name="Franklin" year="2000"></AUTHOR>
+ <AUTHOR name="Wedel, Cifelli, Sanders"></AUTHOR>
+ </SPECIES>
+ <PLACE name="Oklahoma"></PLACE>
+ <TIME value="Albian"></TIME>
+ <LENGTH value="30" q="1"></LENGTH>
+ <REMAINS content="rib, cervical vertebrae"></REMAINS>
+ <ESSAY>
+ <P> This new <NOMEN name="Brachiosaurus"></NOMEN>-like <LINK content="dinosaur"></LINK>
+ was perhaps the tallest. With its head raised, it stood 60 feet (nearly
+ 20 m) tall. </P>
+ </ESSAY>
+
+ <idzebra xmlns="http://www.indexdata.dk/zebra/">
+ <size>593</size>
+ <localnumber>891</localnumber>
+ <filename>records/genera.xml</filename>
+ </idzebra>
+ </GENUS>
</screen>
</para>
<para>
</sect1>
<sect1 id="example2">
- <title>Example 2: Adding Some Configuration</title>
+ <title>Example 2: Supporting Z39.50 Searches</title>
<para>
You may have noticed as <literal>zebraidx</literal> was building
- the database that it issued several warnings, which we ignored at
- the time:
- <screen>
-zebraidx -t grs.sgml update records
-02:12:32-30/08: zebraidx(18151) [warn] default.idx [No such file or directory]
-02:12:32-30/08: zebraidx(18151) [warn] Couldn't open explain.abs [No such file or directory]
-02:12:32-30/08: zebraidx(18151) [warn] records/genera.xml:0 Couldn't open GENUS.abs [No such file or directory]
-02:12:32-30/08: zebraidx(18151) [warn] records/genera.xml:0 Unknown register type: 0
-02:12:32-30/08: zebraidx(18151) [warn] records/genera.xml:0 Unknown register type: w
-02:12:35-30/08: zebraidx(18151) [warn] records/taxa.xml:0 Couldn't open TAXON.abs [No such file or directory]
- </screen>
- And the server issued several more as the client connected to it,
- then searched for and retrieved a record:
+ the database that it issued a warning, which we ignored at the
+ time:
<screen>
-02:17:10-30/08: zebrasrv(18165) [warn] default.idx [No such file or directory]
-02:17:10-30/08: zebrasrv(18165) [warn] Couldn't open explain.abs [No such file or directory]
-02:17:57-30/08: zebrasrv(18165) [warn] Unknown register type: w
-02:18:42-30/08: zebrasrv(18165) [warn] Couldn't open GENUS.abs [No such file or directory]
+ $ zebraidx update records
+ 00:45:46-08/10: ../../index/zebraidx(5016) [warn] records/genera.xml:0 Couldn't open GENUS.abs [No such file or directory]
</screen>
</para>
</sect1>
The master configuration file, <literal>zebra.cfg</literal>,
which is as short and simple as it can be:
<screen>
- # $Header: /home/cvsroot/idis/doc/examples.xml,v 1.6 2002-09-20 09:58:04 mike Exp $
+ # $Header: /home/cvsroot/idis/doc/examples.xml,v 1.7 2002-10-08 08:09:43 mike Exp $
# Bare-bones master configuration file for Zebra
profilePath: .:../../tab:../../../yaz/tab
</screen>
The BIB-1 attribute set configuration file,
<literal>bib1.att</literal>, which is also as short as possible:
<screen>
- # $Header: /home/cvsroot/idis/doc/examples.xml,v 1.6 2002-09-20 09:58:04 mike Exp $
+ # $Header: /home/cvsroot/idis/doc/examples.xml,v 1.7 2002-10-08 08:09:43 mike Exp $
# Bare-bones BIB-1 attribute set file for Zebra
reference Bib-1
</screen>
<chapter id="installation">
- <!-- $Id: installation.xml,v 1.4 2002-04-10 14:47:49 heikki Exp $ -->
+ <!-- $Id: installation.xml,v 1.5 2002-10-08 08:09:43 mike Exp $ -->
<title>Installation</title>
<para>
An ANSI C compiler is required to compile the Zebra
</para>
<para>
- When configured build the software by typing:
+ When configured, build the software by typing:
<screen>
make
</para>
<para>
- If successful, two executables have been created in the sub-directory
+ If successful, two executables are created in the sub-directory
<literal>index</literal>.
<variablelist>
</para>
<para>
- You can now use Zebra. If you wish to install it system-wide, type
+ You can now use Zebra. If you wish to install it system-wide, then
+ as root type
<screen>
make install
</screen>
<chapter id="introduction">
- <!-- $Id: introduction.xml,v 1.13 2002-09-20 09:58:04 mike Exp $ -->
+ <!-- $Id: introduction.xml,v 1.14 2002-10-08 08:09:43 mike Exp $ -->
<title>Introduction</title>
<sect1>
and how to configure the server to give you the
functionality that you need.
</para>
-
- <para>
- If you use Zebra, you should visit its
- <ulink url="http://www.indexdata.dk/zebra/">web site</ulink>,
- where you can join the
- <ulink url="http://www.indexdata.dk/mailman/listinfo/zebralist">
- mailing-list</ulink>
- by sending email to
- <email>### zebra-subscribe@mailman.indexdata.dk</email>
- </para>
-
</sect1>
<sect1 id="features">
<para>
Arbitrarily complex records. The internal data format
is an structured format conceptually similar to XML or GRS-1,
- which allows nested structured data elements and
+ which allows lists, nested structured data elements and
variant forms of data.
</para>
</listitem>
<para>
Configurable to understand many input formats.
A system of input filters driven by
- regular expressions allows you to easily process most ASCII-based
- data formats. SGML, XML, ISO2709 (MARC), and raw text are also
+ regular expressions allows most ASCII-based
+ data formats to be easily processed.
+ SGML, XML, ISO2709 (MARC), and raw text are also
supported.
</para>
</listitem>
Searching supports a powerful combination of boolean queries as
well as relevance-ranking (free-text) queries. Truncation,
masking, full regular expression matching and "approximate
- matching" (eg. spelling mistakes) are all supported.
+ matching" (eg. spelling mistakes) are all handled.
</para>
</listitem>
<para>
Zebra is written in portable C, so it runs on most Unix-like systems
as well as Windows NT. A binary distribution for Windows NT is
- available.
+ available at
+ <ulink url="http://indexdata.dk/zebra/###"/>
</para>
</listitem>
<itemizedlist>
<listitem>
<para>
- Protocol facilities: Init, Search, Present (retrieval), Delete,
- Scan (index browsing) and Sort.
+ Protocol facilities: Init, Search, Present (retrieval),
+ Segmentation (support for very large records), Delete, Scan
+ (index browsing), Sort, Close and some Extended Services.
</para>
</listitem>
<listitem>
<para>
- Piggy-backed presents are honored in the search-request.
+ Piggy-backed presents are honored in the search request - that
+ is, a subset of the found records can be returned directly with
+ a search response, enabling search and retrieval to happen in a
+ single round-trip.
</para>
</listitem>
(GIRT is the German Indexing and Retrieval Testdatabase. It is a
standard German-language test database for intelligent indexing
and retrieval systems. See
- <ulink url="http://www.gesis.org/forschung/informationstechnologie/clef-delos.htm"/>
+ <ulink url="http://www.gesis.org/forschung/informationstechnologie/clef-delos.htm"/>)
</para>
<para>
Evaluation will take place as part of the TREC/CLEF campaign 2003
<sect2>
<title>ULS (Union List of Serials)</title>
<para>
- The London School of Economics (### I think)
- are involved in a projects called ULS to provide a union catalogue
+ The M25-Link systems team
+ (<ulink url="http://www.m25lib.ac.uk/M25link/"/>)
+ are involved in a project called ULS to provide a union catalogue
for periodicals in 21 member libraries. They do this with an
unusual architecture which they call a
``non-distributed virtual union catalogue''.
holdings. Then 21 individual Z39.50 targets are created, each
using Zebra, and all mounted on the single hardware server.
The live service provides a web gateway allowing Z39.50 searching
- of all 21 targets or a selection of them.
+ of all of the targets or a selection of them. Zebra's small
+ footprint allows a relatively modest system to comfortably host
+ the 21 servers.
</para>
<para>
More information can be found at
indexes of large web sites, typically in the region of tens of
millions of pages. In this role, it functions somewhat similarly
to the engine of google or altavista, but for a selected intranet
- or subset of the whole Web.
+ or a subset of the whole Web.
</para>
<para>
For example, Liverpool University's web-search facility (see on
</sect2>
</sect1>
+
+ <sect1 id="support">
+ <title>Support</title>
+ <para>
+ You can get support for Zebra from at least three sources.
+ </para>
+ <para>
+ First, there's the Zebra web site at
+ <ulink url="http://www.indexdata.dk/zebra/"/>,
+ which always has the most recent version available for download.
+ If you have a problem with Zebra, the first thing to do is see
+ whether it's fixed in the current release.
+ </para>
+ <para>
+ Second, there's the Zebra mailing list. Its home page at
+ <ulink url="http://www.indexdata.dk/mailman/listinfo/zebralist"/>
+ includes a complete archive of all messages that have ever been
+ posted on the list. The Zebra mailing list is used both for
+ announcements from the authors (new
+ releases, bug fixes, etc.) and general discussion. You are welcome
+ to seek support there. Join by sending email to
+ <email>zebra-subscribe-###@mailman.indexdata.dk</email>
+ </para>
+ <para>
+ Third, it's possible to buy a commercial support contract, with
+ well defined service levels and response times, from Index Data.
+ See
+ <ulink url="http://www.indexdata.dk/support/###"/>
+ for details.
+ </para>
+ </sect1>
+
+
<sect1 id="future">
<title>Future Directions</title>
information retrieval engine and high-performance XML
repository.
</para>
+ <para>
+ ### Partially done.
+ </para>
</listitem>
<listitem>
Access to search engine through SOAP/RPC API to allow the
construction of applications without requiring Z39.50 tools.
</para>
+ <para>
+ ### Partially done, thanks to the new SRW/Z39.50 gateway.
+ </para>
</listitem>
<listitem>
</para>
<para>
If you think it's all really neat, you're welcome to drop us a line
- saying that, too. You'll find contact info at the end of this file.
+ saying that, too. You can email us on
+ <email>indo@indexdata.dk</email>
+ or check the contact info at the end of this manual.
</para>
</sect1>
<chapter id="quick-start">
- <!-- $Id: quickstart.xml,v 1.2 2002-04-10 14:47:49 heikki Exp $ -->
+ <!-- $Id: quickstart.xml,v 1.3 2002-10-08 08:09:43 mike Exp $ -->
<title>Quick Start </title>
<para>
file named <literal>zebra.cfg</literal> with the following contents:
<screen>
- # Where are the YAZ tables located.
- profilePath: ../../../yaz/tab ../../tab
-
+ # Where the schema files, attribute files, etc are located.
+ profilePath: .:../../tab:../../../yaz/tab:/usr/local/share/yaz/tab:/usr/share/yaz/tab
+
# Files that describe the attribute sets supported.
attset: bib1.att
attset: gils.att
+ attset: explain.att
+
+ recordtype: grs.sgml
+ isam: c
</screen>
</para>
<para>
- Now, edit the file and set <literal>profilePath</literal> to the path of the
+ If necessary, edit the file and set <literal>profilePath</literal> to the path of the
YAZ profile tables (sub directory <literal>tab</literal> of the YAZ
distribution archive).
</para>
<literal>records</literal>. To index these, type:
<screen>
- $ ../../index/zebraidx -t grs.sgml update records
+ zebraidx update records
</screen>
</para>
<para>
- In the command above the option <literal>-t</literal> specified the record
- type — in this case <literal>grs.sgml</literal>.
- The word <literal>update</literal> followed
+ In the command above, the word <literal>update</literal> followed
by a directory root updates all files below that directory node.
</para>
fire up a server. To start a server on port 2100, type:
<screen>
- $ ../../index/zebrasrv tcp:@:2100
+ zebrasrv tcp:@:2100
</screen>
</para>
</para>
<para>
- To test the server, you can use any Z39.50 client (1992 or later).
- For instance, you can use the demo client that comes with YAZ: Just
- cd to the <literal>client</literal> subdirectory of the YAZ distribution
- and type:
+ To test the server, you can use any Z39.50 client.
+ For instance, you can use the demo client that comes with YAZ:
</para>
<para>
<screen>
- $ ./yaz-client tcp:localhost:2100
+ yaz-client tcp:localhost:2100
</screen>
</para>
When the client has connected, you can type:
</para>
-<para>
-
+ <para>
<screen>
Z> find surficial
Z> show 1
</para>
</note>
<para>
- If you've made it this far, there's a good chance that
- you've got through the compilation OK.
+ If you've made it this far, you know that your installation is
+ working, but there's a certain amount of voodoo going on - for
+ example, the mysterious incantations in the
+ <literal>zebra.cfg</literal> file. In order to help us understand
+ these fully, the next chapter will work through a series of
+ increasingly complex example configurations.
</para>
</chapter>
<!ENTITY app-indexdata SYSTEM "indexdata.xml">
<!ENTITY zebraidx-options SYSTEM "zebraidx-options.xml">
]>
-<!-- $Id: zebra.xml.in,v 1.10 2002-09-19 21:06:51 adam Exp $ -->
+<!-- $Id: zebra.xml.in,v 1.11 2002-10-08 08:09:43 mike Exp $ -->
<book id="zebra">
<bookinfo>
<title>Zebra - User's Guide and Reference</title>
Zebra is a free, fast, friendly information management system. It
can index records in XML/SGML, MARC, e-mail archives and many
other formats, and quickly find them using a combination of
- boolean searching and relevance ranking. Search/retrieval
+ boolean searching and relevance ranking. Search-and-retrieve
applications can be written using APIs in a wide variety of
languages, communicating with the Zebra server using
industry-standard information-retrieval protocols.