Remove harvest.mbox -- its content is now incorporated.
<chapter id="examples">
- <!-- $Id: examples.xml,v 1.17 2002-11-08 17:00:57 mike Exp $ -->
+ <!-- $Id: examples.xml,v 1.18 2002-12-01 23:26:26 mike Exp $ -->
<title>Example Configurations</title>
<sect1>
<listitem>
<para>
- Where to find subsidiary configuration files, including
- <literal>default.idx</literal>
+ Where to find subsidiary configuration files, including both
+ those that are named explicitly and a few ``magic'' files such
+ as <literal>default.idx</literal>,
which specifies the default indexing rules.
</para>
</listitem>
<listitem>
<para>
- What attribute sets to recognise in searches.
+ What record schemas to support. (Subsidiary files specifiy how
+ to index the contents of records in those schemas, and what
+ format to use when presenting records in those schemas to client
+ software.)
</para>
</listitem>
<listitem>
<para>
- Policy details such as what record type to expect, what
- low-level indexing algorithm to use, how to identify potential
- duplicate records, etc.
+ What attribute sets to recognise in searches. (Subsidiary files
+ specify how to interpret the attributes in terms
+ of the indexes that are created on the records.)
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ Policy details such as what type of input format to expect when
+ adding new records, what low-level indexing algorithm to use,
+ how to identify potential duplicate records, etc.
</para>
</listitem>
<literal>dino.tree</literal>.)
Type <literal>make records/dino.xml</literal>
to make the XML data file.
+ (Or you could just type <literal>make</literal> to build the XML
+ data file, create the database and populate it with the taxonomic
+ records all in one shot - but then you wouldn't learn anything,
+ would you? :-)
</para>
<para>
Now we need to create a Zebra database to hold and index the XML
Zebra indexer, <literal>zebraidx</literal>, which is
driven by the <literal>zebra.cfg</literal> configuration file.
For our purposes, we don't need any
- special behaviour - we can use the defaults - so we start with a
+ special behaviour - we can use the defaults - so we can start with a
minimal file that just tells <literal>zebraidx</literal> where to
find the default indexing rules, and how to parse the records:
<screen>
XPath-based boolean queries and fetch the XML records that satisfy
them:
<screen>
- $ yaz-client tcp:@:9999
+ $ yaz-client @:9999
Connecting...Ok.
Z> find @attr 1=/Zthes/termName Sauroposeidon
Number of hits: 1
<termId>22</termId>
<termName>Sauroposeidon</termName>
<termType>PT</termType>
+ <termNote>The tallest known dinosaur (18m)</termNote>
<relation>
<relationType>BT</relationType>
<termId>21</termId>
</relation>
<idzebra xmlns="http://www.indexdata.dk/zebra/">
- <size>245</size>
+ <size>300</size>
<localnumber>23</localnumber>
<filename>records/dino.xml</filename>
</idzebra>
</screen>
</para>
<para>
- Now wasn't that easy?
+ Now wasn't that nice and easy?
</para>
</sect1>
significantly because it ties searching semantics to the physical
structure of the searched records. You can't use the same search
specification to search two databases if their internal
- representations are different. Consider an alternative taxonomy
+ representations are different. Consider an different taxonomy
database in which the records have taxon names specified
inside a <literal><name></literal> element nested within a
<literal><identification></literal> element
said about implementation: in a given database, an access point
might be implemented as an index, a path into physical records, an
algorithm for interrogating relational tables or whatever works.
- The key point is that the semantics of an access point are fixed
- and well defined.
+ The only important thing point is that the semantics of an access
+ point are fixed and well defined.
</para>
<para>
For convenience, access points are gathered into <firstterm>attribute
In practice, the BIB-1 attribute set has tended to be a dumping
ground for all sorts of access points, so that, for example, it
includes some geospatial access points as well as strictly
- bibliographic ones. Nevertheless, the key point is that this model
+ bibliographic ones. Nevertheless, this model
allows a layer of abstraction over the physical representation of
records in databases.
</para>
<literal><Zthes></literal> element.
</para>
<para>
+ ### Here's where it all goes to pieces. The current arrangement is
+ very awkward (and somewhat embarrassing) to describe, and the new
+ arrangement hasn't actually been implemented yet.
+ </para>
+ <para>
This is a two-step process. First, we need to tell Zebra that we
want to support the BIB-1 attribute set. Then we need to tell it
which elements of its record pertain to access point 4.
+++ /dev/null
-From zebralist-admin@indexdata.dk Sun Nov 24 23:16:24 2002
-MIME-Version: 1.0
-Envelope-to: zebra@miketaylor.org.uk
-Content-Type: text/plain;
- charset="us-ascii"
-From: Kang-Jin Lee <lee@arco.de>
-To: zebralist@indexdata.dk
-User-Agent: KMail/1.4.3
-X-Spam-Level:
-Subject: [Zebralist] Some progress on Harvest's move to Zebra
-Sender: zebralist-admin@indexdata.dk
-X-BeenThere: zebralist@indexdata.dk
-X-Mailman-Version: 2.0.11
-Precedence: bulk
-List-Help: <mailto:zebralist-request@indexdata.dk?subject=help>
-List-Post: <mailto:zebralist@indexdata.dk>
-List-Subscribe: <http://www.indexdata.dk/mailman/listinfo/zebralist>,
- <mailto:zebralist-request@indexdata.dk?subject=subscribe>
-List-Id: Zebra Information Server <zebralist.indexdata.dk>
-List-Unsubscribe: <http://www.indexdata.dk/mailman/listinfo/zebralist>,
- <mailto:zebralist-request@indexdata.dk?subject=unsubscribe>
-List-Archive: <http://www.indexdata.dk/pipermail/zebralist/>
-Date: Sun, 24 Nov 2002 20:45:19 +0100
-X-Spam-Status: No, hits=-1.0 required=5.0 tests=AWL version=2.20
-X-Spam-Level:
-X-MIME-Autoconverted: from quoted-printable to 8bit by localhost.localdomain id gAONGNK15639
-
-Hi,
-
-I finished first steps to use Zebra as fulltext engine for Harvest
-(http://harvest.sourceforge.net/). The performance boost after
-some testing are quite impressive.
-
-Here is my article I wrote for the Harvest mailinglist.
-
-Many thanks for Zebra.
-
-------------------------------------------------------
-Hi,
-
-The first results after some testing with Zebra are very promising.
-
-The tests were done with around 220 000 SOIF files, which occupies
-1.6GB of disk space.
-
-Building the index from scratch takes around one hour with Zebra where
-Glimpse needs around five hours.
-
-While glimpse blocks search requests when updating its index, Zebra
-can still answer search requests.
-
-While the search time of glimpse varies from some seconds to some
-minutes depending how expensive the query is, Zebra usually takes
-around one to three seconds, even for expensive queries.
-
-Glimpse' index occupies around 250MB of disk space, Zebra's index
-takes around 570MB.
-
-Zebra supports incremental indexing which will speed up indexing even
-further.
-
-There are still potential for faster searches when necessary, using
-tweaks on apache.
-
-On the other hand, modeling data is not complete, yet.
-
-To sum it up:
-- Zebra indexes data five times faster than Glimpse
-- Zebra doesn't cause downtimes for indexupdate
-- Zebra's search time doesn't jump from seconds to minutes for no
- obvious reason, but stays constant within a range of one to three
- seconds
-- Zebra can search more than 100 times faster than Glimpse
-- Zebra can process multiple search requests simultaneously
-- Zebra can speed up indexing by using incremental indexing
-- Glimpse's index size is only around half of the Zebra's index
-
-kj
-------------------------------------------------------
-
-_______________________________________________
-Zebralist mailing list
-Zebralist@indexdata.dk
-http://www.indexdata.dk/mailman/listinfo/zebralist
-
-From mike@miketaylor.org.uk Sun Nov 24 23:41:14 2002
-Date: Sun, 24 Nov 2002 23:41:13 GMT
-From: Mike Taylor <mike@miketaylor.org.uk>
-X-Was-To: lee@arco.de
-X-Was-CC: zebralist@indexdata.dk
-Cc: mike@localhost.localdomain
-In-reply-to: <200211242045.19196.lee@arco.de> (message from Kang-Jin Lee on
- Sun, 24 Nov 2002 20:45:19 +0100)
-Subject: Re: [Zebralist] Some progress on Harvest's move to Zebra
-
-> Date: Sun, 24 Nov 2002 20:45:19 +0100
-> From: Kang-Jin Lee <lee@arco.de>
->
-> Here is my article I wrote for the Harvest mailinglist.
-
-Hi K-J,
-
-It's nice to read all this good stuff about Zebra! I'm currently
-working on changes to the documentation for the next Zebra release,
-and I'd love to include a lightly-edited version of your message in
-the new document. (Basically, I'd obscure the name of your old
-engine, so it's clear that we're trying to say good things about Zebra
-rather than score points off a competitor.) Would it be OK for me to
-quote you? If yes in principle, then I'll run the actual wording past
-you before submitting it.
-
-Thanks,
-
- _/|_ _______________________________________________________________
-/o ) \/ Mike Taylor <mike@miketaylor.org.uk> www.miketaylor.org.uk
-)_v__/\ "You question the worthiness of my code? I should kill you
- where you stand!" -- Klingon Programming Mantra
-
-From lee@arco.de Mon Nov 25 10:02:13 2002
-MIME-Version: 1.0
-Envelope-to: mike@miketaylor.org.uk
-Content-Type: text/plain;
- charset="iso-8859-15"
-From: Kang-Jin Lee <lee@arco.de>
-To: Mike Taylor <mike@miketaylor.org.uk>
-Subject: Re: [Zebralist] Some progress on Harvest's move to Zebra
-Date: Mon, 25 Nov 2002 08:27:42 +0100
-User-Agent: KMail/1.4.3
-In-Reply-To: <200211242340.gAONefg15769@localhost.localdomain>
-X-Spam-Status: No, hits=-4.4 required=5.0 tests=IN_REP_TO version=2.20
-X-Spam-Level:
-Content-Length: 836
-X-MIME-Autoconverted: from quoted-printable to 8bit by seatbooker.net id JAA28796
-
-Hi,
-
-On Monday 25 November 2002 00:40, you wrote:
-> > Date: Sun, 24 Nov 2002 20:45:19 +0100
-> > From: Kang-Jin Lee <lee@arco.de>
-> >
-> > Here is my article I wrote for the Harvest mailinglist.
->
-> Hi K-J,
->
-> It's nice to read all this good stuff about Zebra! I'm currently
-> working on changes to the documentation for the next Zebra release,
-> and I'd love to include a lightly-edited version of your message in
-> the new document. (Basically, I'd obscure the name of your old
-> engine, so it's clear that we're trying to say good things about Zebra
-> rather than score points off a competitor.) Would it be OK for me to
-> quote you? If yes in principle, then I'll run the actual wording past
-> you before submitting it.
-
-You are welcome to do this.
-
-I am very happy to see such a nice software available under GPL.
-
-Thanks.
-
-kj
-
-From zebralist-admin@indexdata.dk Mon Nov 25 11:13:10 2002
-MIME-Version: 1.0
-Envelope-to: zebra@miketaylor.org.uk
-From: Pete <P.D.Mallinson@liverpool.ac.uk>
-X-X-Sender: qq15@uxa.liv.ac.uk
-To: Kang-Jin Lee <lee@arco.de>
-cc: zebralist@indexdata.dk
-Subject: Re: [Zebralist] Some progress on Harvest's move to Zebra
-In-Reply-To: <200211242045.19196.lee@arco.de>
-Content-Type: TEXT/PLAIN; charset=US-ASCII
-X-Spam-Level:
-Sender: zebralist-admin@indexdata.dk
-X-BeenThere: zebralist@indexdata.dk
-X-Mailman-Version: 2.0.11
-Precedence: bulk
-List-Help: <mailto:zebralist-request@indexdata.dk?subject=help>
-List-Post: <mailto:zebralist@indexdata.dk>
-List-Subscribe: <http://www.indexdata.dk/mailman/listinfo/zebralist>,
- <mailto:zebralist-request@indexdata.dk?subject=subscribe>
-List-Id: Zebra Information Server <zebralist.indexdata.dk>
-List-Unsubscribe: <http://www.indexdata.dk/mailman/listinfo/zebralist>,
- <mailto:zebralist-request@indexdata.dk?subject=unsubscribe>
-List-Archive: <http://www.indexdata.dk/pipermail/zebralist/>
-Date: Mon, 25 Nov 2002 10:19:37 +0000 (GMT)
-X-Spam-Status: No, hits=-4.4 required=5.0 tests=IN_REP_TO version=2.20
-X-Spam-Level:
-Content-Length: 2853
-
-On Sun, 24 Nov 2002, Kang-Jin Lee wrote:
-
->Hi,
->
->I finished first steps to use Zebra as fulltext engine for Harvest
->(http://harvest.sourceforge.net/). The performance boost after
->some testing are quite impressive.
-
-Hi ... I'd almost forgotten that the Harvest project is still active.
-
-We had a heap of challenges with our Harvest setup and with the
-time taken to index and search ... we switched to using
-Harvest-NG as the "reaper/gatherer" and modified Zebra to
-work with SOIF and our own ranking algorithm - it's been in
-service for over 6 months now.
-
-We had challenges with both speed of gathering and with
-speed of indexing and searching but most seem to be
-"managable" now.
-
-We offered our modifications to Zebra to Indexdata who
-offered to look at them since the latest release of Zebra
-is sufficiently different at the code level to make it
-non-trivial for us to apply our code modifications to
-it.
-
-
-Cheers
-
-Pete Mallinson
-
->
->Here is my article I wrote for the Harvest mailinglist.
->
->Many thanks for Zebra.
->
->------------------------------------------------------
->Hi,
->
->The first results after some testing with Zebra are very promising.
->
->The tests were done with around 220 000 SOIF files, which occupies
->1.6GB of disk space.
->
->Building the index from scratch takes around one hour with Zebra where
->Glimpse needs around five hours.
->
->While glimpse blocks search requests when updating its index, Zebra
->can still answer search requests.
->
->While the search time of glimpse varies from some seconds to some
->minutes depending how expensive the query is, Zebra usually takes
->around one to three seconds, even for expensive queries.
->
->Glimpse' index occupies around 250MB of disk space, Zebra's index
->takes around 570MB.
->
->Zebra supports incremental indexing which will speed up indexing even
->further.
->
->There are still potential for faster searches when necessary, using
->tweaks on apache.
->
->On the other hand, modeling data is not complete, yet.
->
->To sum it up:
->- Zebra indexes data five times faster than Glimpse
->- Zebra doesn't cause downtimes for indexupdate
->- Zebra's search time doesn't jump from seconds to minutes for no
-> obvious reason, but stays constant within a range of one to three
-> seconds
->- Zebra can search more than 100 times faster than Glimpse
->- Zebra can process multiple search requests simultaneously
->- Zebra can speed up indexing by using incremental indexing
->- Glimpse's index size is only around half of the Zebra's index
->
->kj
->------------------------------------------------------
->
->_______________________________________________
->Zebralist mailing list
->Zebralist@indexdata.dk
->http://www.indexdata.dk/mailman/listinfo/zebralist
->
-
-
-
-_______________________________________________
-Zebralist mailing list
-Zebralist@indexdata.dk
-http://www.indexdata.dk/mailman/listinfo/zebralist
-
-From zebralist-admin@indexdata.dk Mon Nov 25 21:39:59 2002
-MIME-Version: 1.0
-Envelope-to: zebra@miketaylor.org.uk
-Content-Type: text/plain;
- charset="iso-8859-1"
-From: Kang-Jin Lee <lee@arco.de>
-To: Pete <P.D.Mallinson@liverpool.ac.uk>
-Subject: Re: [Zebralist] Some progress on Harvest's move to Zebra
-User-Agent: KMail/1.4.3
-In-Reply-To: <Pine.GSO.4.44.0211251007060.15395-100000@uxa.liv.ac.uk>
-Cc: zebralist@indexdata.dk
-X-Spam-Level:
-Sender: zebralist-admin@indexdata.dk
-X-BeenThere: zebralist@indexdata.dk
-X-Mailman-Version: 2.0.11
-Precedence: bulk
-List-Help: <mailto:zebralist-request@indexdata.dk?subject=help>
-List-Post: <mailto:zebralist@indexdata.dk>
-List-Subscribe: <http://www.indexdata.dk/mailman/listinfo/zebralist>,
- <mailto:zebralist-request@indexdata.dk?subject=subscribe>
-List-Id: Zebra Information Server <zebralist.indexdata.dk>
-List-Unsubscribe: <http://www.indexdata.dk/mailman/listinfo/zebralist>,
- <mailto:zebralist-request@indexdata.dk?subject=unsubscribe>
-List-Archive: <http://www.indexdata.dk/pipermail/zebralist/>
-Date: Mon, 25 Nov 2002 20:39:47 +0100
-X-Spam-Status: No, hits=-3.2 required=5.0 tests=IN_REP_TO,AWL version=2.20
-X-Spam-Level:
-X-MIME-Autoconverted: from quoted-printable to 8bit by localhost.localdomain id gAPLdwK18535
-
-Hi,
-
-On Monday 25 November 2002 11:19, Pete wrote:
-
-> On Sun, 24 Nov 2002, Kang-Jin Lee wrote:
-
-> >I finished first steps to use Zebra as fulltext engine for Harvest
-> >(http://harvest.sourceforge.net/). The performance boost after
-> >some testing are quite impressive.
->
-> Hi ... I'd almost forgotten that the Harvest project is still active.
-
-It seems that everybody has forgotten Harvest. :-)
-
-> We had a heap of challenges with our Harvest setup and with the
-> time taken to index and search ... we switched to using
-> Harvest-NG as the "reaper/gatherer" and modified Zebra to
-> work with SOIF and our own ranking algorithm - it's been in
-> service for over 6 months now.
-
-I am very interested in your setup. Would it be possible to send
-your configuration files and modifications to me?
-I made some small modifications to soif.flt and am still wondering
-which query I should use. It would be very nice if I don't have to
-reinvent the wheel.
-
-> We had challenges with both speed of gathering and with
-> speed of indexing and searching but most seem to be
-> "managable" now.
-
-How big is your gatherer?
-
-> We offered our modifications to Zebra to Indexdata who
-> offered to look at them since the latest release of Zebra
-> is sufficiently different at the code level to make it
-> non-trivial for us to apply our code modifications to
-> it.
-
-I would like to take a look at the modifications, too.
-
-Thanks.
-
-kj
-
-
-_______________________________________________
-Zebralist mailing list
-Zebralist@indexdata.dk
-http://www.indexdata.dk/mailman/listinfo/zebralist
-
<chapter id="installation">
- <!-- $Id: installation.xml,v 1.5 2002-10-08 08:09:43 mike Exp $ -->
+ <!-- $Id: installation.xml,v 1.6 2002-12-01 23:26:26 mike Exp $ -->
<title>Installation</title>
<para>
An ANSI C compiler is required to compile the Zebra
Unpack the distribution archive. The <literal>configure</literal>
shell script attempts to guess correct values for various
system-dependent variables used during compilation.
- It uses those values to create a 'Makefile' in each directory of Zebra.
+ It uses those values to create a <literal>Makefile</literal> in each
+ directory of Zebra.
</para>
<para>
<para>
The configure script attempts to use C compiler specified by
the <literal>CC</literal> environment variable.
- If not set, <literal>cc</literal> or GNU C will be used.
+ If this is not set, <literal>cc</literal> or GNU C will be used.
The <literal>CFLAGS</literal> environment variable holds
options to be passed to the C compiler. If you're using a
Bourne-shell compatible shell you may pass something like this:
<screen>
CC=/opt/ccs/bin/cc CFLAGS=-O ./configure
</screen>
-
- The configure script takes a number of arguments, you can see
- them all with
+ </para>
+ <para>
+ The configure script support various options: you can see what they
+ are with
<screen>
./configure --help
</screen>
-
</para>
<para>
- When configured, build the software by typing:
-
+ Once the build environment is configured, build the software by
+ typing:
<screen>
make
</screen>
-
</para>
<para>
- If successful, two executables are created in the sub-directory
- <literal>index</literal>.
+ If the build is successful, two executables are created in the
+ sub-directory <literal>index</literal>:
<variablelist>
<varlistentry>
By default this will install the Zebra executables in
<filename>/usr/local/bin</filename>,
and the standard configuration files in
- <filename>/usr/local/share/zebra</filename>
+ <filename>/usr/local/share/idzebra</filename>
You can override this with the <literal>--prefix</literal> option
to configure.
</para>
<chapter id="introduction">
- <!-- $Id: introduction.xml,v 1.21 2002-11-08 17:00:57 mike Exp $ -->
+ <!-- $Id: introduction.xml,v 1.22 2002-12-01 23:26:26 mike Exp $ -->
<title>Introduction</title>
<sect1>
<title>Overview</title>
<para>
- <ulink url="http://indexdata.dk/zebra/">
- Zebra</ulink>
+ <ulink url="http://indexdata.dk/zebra/">Zebra</ulink>
is a high-performance, general-purpose structured text
- indexing and retrieval engine. It reads structured records in a
+ indexing and retrieval engine. It reads records in a
variety of input formats (eg. email, XML, MARC) and provides access
to them through a powerful combination of boolean search
expressions and relevance-ranked free-text queries.
<listitem>
<para>
- Very large databases: files for indexes, etc. can be
+ Very large databases: logical files can be
automatically partitioned over multiple disks.
</para>
</listitem>
<listitem>
<para>
Arbitrarily complex records. The internal data format
- is an structured format conceptually similar to XML or GRS-1,
+ is a structured format conceptually similar to XML or GRS-1,
which allows lists, nested structured data elements and
variant forms of data.
</para>
which is populated by the Harvest-NG web-crawling software.
</para>
<para>
- For more information, contact John Gilbertson
+ For more information on Liverpool university's intranet search
+ architecture, contact John Gilbertson
<email>jgilbert@liverpool.ac.uk</email>
</para>
+ <para>
+ Kang-Jin Lee
+ <email>lee@arco.de</email>,
+ has recently modified the Harvest-NG web crawler to use Zebra as
+ its native repository engine. His comments on the switch over
+ from the old engine are revealing:
+ <blockquote>
+ <para>
+ The first results after some testing with Zebra are very
+ promising. The tests were done with around 220,000 SOIF files,
+ which occupies 1.6GB of disk space.
+ </para>
+ <para>
+ Building the index from scratch takes around one hour with Zebra
+ where [old-engine] needs around five hours. While [old-engine]
+ blocks search requests when updating its index, Zebra can still
+ answer search requests.
+ [...]
+ Zebra supports incremental indexing which will speed up indexing
+ even further.
+ </para>
+ <para>
+ While the search time of [old-engine] varies from some seconds
+ to some minutes depending how expensive the query is, Zebra
+ usually takes around one to three seconds, even for expensive
+ queries.
+ [...]
+ Zebra can search more than 100 times faster than [old-engine]
+ and can process multiple search requests simultaneously
+ </para>
+ <para>
+ I am very happy to see such nice software available under GPL.
+ </para>
+ </blockquote>
+ </para>
</sect2>
</sect1>
announcements from the authors (new
releases, bug fixes, etc.) and general discussion. You are welcome
to seek support there. Join by sending email to
- <email>zebra-request@indexdata.dk</email>. Put the word
+ <email>zebra-request@indexdata.dk</email> with the word
<literal>subscribe</literal> in the body of the message.
</para>
<para>
Improved support for XML in search and retrieval. Eventually,
the goal is for Zebra to pull double duty as a flexible
information retrieval engine and high-performance XML
- repository.
- </para>
- <para>
- ### Partially done.
+ repository. The recent addition of XPath searching is one
+ example of the kind of enhancement we're working on.
</para>
</listitem>
<listitem>
<para>
- Access to search engine through SOAP/RPC API to allow the
+ Access to the search engine through SOAP/RPC API to allow the
construction of applications without requiring Z39.50 tools.
- </para>
- <para>
- ### Partially done, thanks to the new SRW/Z39.50 gateway.
+ This will shortly be available by means of Index Data's
+ SRW-to-Z39.50 gateway, currently in beta test.
</para>
</listitem>
<listitem>
<para>
+ Support for the use of Perl both for access to the Zebra API
+ and for building extension ``plug-ins'' such as input filters.
+ The code for this has been contributed to the source tree, and
+ is in the process of being integrated and tested.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
Improved free-text searching. We're first and foremost octet jockeys and
we're actively looking for organisations or people who'd like
to contribute experience in relevance ranking and text
<chapter id="quick-start">
- <!-- $Id: quickstart.xml,v 1.7 2002-10-30 14:35:09 adam Exp $ -->
+ <!-- $Id: quickstart.xml,v 1.8 2002-12-01 23:26:26 mike Exp $ -->
<title>Quick Start </title>
-
- <!--
- FIXME - Start with the new improved example scripts that run
- without any configuration file changes!
- ### do we want this now we have "examples.html"? - mike, 15/10/02
- -->
<para>
- In this section, we will test the system by indexing a small set of sample
- GILS records that are included with the software distribution. Go to the
- <literal>examples/gils</literal> subdirectory of the distribution archive.
- There you will find a configuration
- file named <literal>zebra.cfg</literal> with the following contents:
-
- <screen>
- # Where the schema files, attribute files, etc are located.
- profilePath: ../../tab
-
- # Files that describe the attribute sets supported.
- attset: bib1.att
- attset: gils.att
- attset: explain.att
-
- recordtype: grs.sgml
- isam: c
- </screen>
+ <!-- ### ulink to GILS profile: what's the URL? -->
+ In this section, we will test the system by indexing a small set of
+ sample GILS records that are included with the Zebra distribution,
+ running Zebra a server against the newly created database, and
+ searching the indexes with a client that connects to that server.
</para>
-
- <!-- No longer necessary
- <para>
- If necessary, edit the file and set <literal>profilePath</literal> to the path of the
- YAZ profile tables (sub directory <literal>tab</literal> of the YAZ
- distribution archive).
- </para>
- -->
-
<para>
- The 48 test records are located in the sub directory
- <literal>records</literal>. To index these, type:
-
+ Go to the <literal>examples/gils</literal> subdirectory of the
+ distribution archive. The 48 test records are located in the sub
+ directory <literal>records</literal>. To index these, type:
<screen>
zebraidx update records
</screen>
</para>
<para>
- In the command above, the word <literal>update</literal> followed
- by a directory root updates all files below that directory node.
+ In this command, the word <literal>update</literal> is followed
+ by the name of a directory: <literal>zebraidx</literal> updates all
+ files in the hierarchy rooted at that directory.
</para>
<para>
fire up a server. To start a server on port 2100, type:
<screen>
- zebrasrv tcp:@:2100
+ zebrasrv @:2100
</screen>
</para>
named <literal>Default</literal>.
The database contains records structured according to
the GILS profile, and the server will
- return records in either either USMARC, GRS-1, or SUTRS depending
- on what your client asks for.
+ return records in USMARC, GRS-1, or SUTRS format depending
+ on what the client asks for.
</para>
<para>
To test the server, you can use any Z39.50 client.
- For instance, you can use the demo client that comes with YAZ:
+ For instance, you can use the demo command-line client that comes
+ with YAZ:
</para>
<para>
<screen>
- yaz-client tcp:localhost:2100
+ yaz-client localhost:2100
</screen>
</para>
</para>
<para>
- The default retrieval syntax for the client is USMARC. To try other
- formats for the same record, try:
+ The default retrieval syntax for the client is USMARC, and the
+ default element set is <literal>F</literal> (``full record''). To
+ try other formats and element sets for the same record, try:
</para>
<para>
<screen>
<note>
<para>You may notice that more fields are returned when your
- client requests SUTRS or GRS-1 records. When retrieving GILS records,
- this is normal - not all of the GILS data elements have mappings in
+ client requests SUTRS, GRS-1 or XML records.
+ This is normal - not all of the GILS data elements have mappings in
the USMARC record format.
</para>
</note>