<chapter id="examples">
- <!-- $Id: examples.xml,v 1.7 2002-10-08 08:09:43 mike Exp $ -->
+ <!-- $Id: examples.xml,v 1.17 2002-11-08 17:00:57 mike Exp $ -->
<title>Example Configurations</title>
<sect1>
driven by a master configuration file, which may refer to other
subsidiary configuration files. By default, they try to use
<filename>zebra.cfg</filename> in the working directory as the
- master file; but this can be changed using the <literal>-t</literal>
+ master file; but this can be changed using the <literal>-c</literal>
option to specify an alternative master configuration file.
</para>
<para>
<para>
This example shows how Zebra can be used with absolutely minimal
configuration to index a body of
- <ulink url="http://www.w3.org/xml/###">XML</ulink>
+ <ulink url="http://www.w3.org/XML/">XML</ulink>
documents, and search them using
- <ulink url="http://www.w3.org/xpath/###">XPath</ulink>
+ <ulink url="http://www.w3.org/TR/xpath">XPath</ulink>
expressions to specify access points.
</para>
<para>
- Go to the <literal>examples/dinosauricon</literal> subdirectory
+ Go to the <literal>examples/zthes</literal> subdirectory
of the distribution archive.
- There you will find a <literal>records</literal> subdirectory,
- which contains some raw XML data to be added to the database: in
- this case, as single file, <literal>genera.xml</literal>,
- which contain information about all the known dinosaur genera as of
- August 2002.
+ There you will find a <literal>Makefile</literal> that will
+ populate the <literal>records</literal> subdirectory with a file of
+ <ulink url="http://zthes.z3950.org/">Zthes</ulink>
+ records representing a taxonomic hierarchy of dinosaurs. (The
+ records are generated from the family tree in the file
+ <literal>dino.tree</literal>.)
+ Type <literal>make records/dino.xml</literal>
+ to make the XML data file.
</para>
<para>
- Now we need to create the Zebra database, which we do with the
+ Now we need to create a Zebra database to hold and index the XML
+ records. We do this with the
Zebra indexer, <literal>zebraidx</literal>, which is
driven by the <literal>zebra.cfg</literal> configuration file.
For our purposes, we don't need any
minimal file that just tells <literal>zebraidx</literal> where to
find the default indexing rules, and how to parse the records:
<screen>
- profilePath: .:../../tab:../../../yaz/tab
+ profilePath: .:../../tab
recordType: grs.sgml
</screen>
</para>
<screen>
$ yaz-client tcp:@:9999
Connecting...Ok.
- Z> find @attr 1=/GENUS/MEANING @and lizard earthquakes
+ Z> find @attr 1=/Zthes/termName Sauroposeidon
Number of hits: 1
Z> format xml
Z> show 1
- <GENUS name="Sauroposeidon" type="with">
- <MEANING>lizard Poseidon <LOW>(Greek god of, among other things, earthquakes)</LOW></MEANING>
- <SPECIES name="proteles">
- <AUTHOR type="vide" name="Franklin" year="2000"></AUTHOR>
- <AUTHOR name="Wedel, Cifelli, Sanders"></AUTHOR>
- </SPECIES>
- <PLACE name="Oklahoma"></PLACE>
- <TIME value="Albian"></TIME>
- <LENGTH value="30" q="1"></LENGTH>
- <REMAINS content="rib, cervical vertebrae"></REMAINS>
- <ESSAY>
- <P> This new <NOMEN name="Brachiosaurus"></NOMEN>-like <LINK content="dinosaur"></LINK>
- was perhaps the tallest. With its head raised, it stood 60 feet (nearly
- 20 m) tall. </P>
- </ESSAY>
+ <Zthes>
+ <termId>22</termId>
+ <termName>Sauroposeidon</termName>
+ <termType>PT</termType>
+ <relation>
+ <relationType>BT</relationType>
+ <termId>21</termId>
+ <termName>Brachiosauridae</termName>
+ <termType>PT</termType>
+ </relation>
<idzebra xmlns="http://www.indexdata.dk/zebra/">
- <size>593</size>
- <localnumber>891</localnumber>
- <filename>records/genera.xml</filename>
+ <size>245</size>
+ <localnumber>23</localnumber>
+ <filename>records/dino.xml</filename>
</idzebra>
- </GENUS>
+ </Zthes>
</screen>
</para>
<para>
</para>
</sect1>
+
<sect1 id="example2">
- <title>Example 2: Supporting Z39.50 Searches</title>
+ <title>Example 2: Supporting Interoperable Searches</title>
<para>
- You may have noticed as <literal>zebraidx</literal> was building
- the database that it issued a warning, which we ignored at the
- time:
- <screen>
- $ zebraidx update records
- 00:45:46-08/10: ../../index/zebraidx(5016) [warn] records/genera.xml:0 Couldn't open GENUS.abs [No such file or directory]
- </screen>
+ The problem with the previous example is that you need to know the
+ structure of the documents in order to find them. For example,
+ when we wanted to find the record for the taxon
+ <foreignphrase role="taxon">Sauroposeidon</foreignphrase>,
+ we had to formulate a complex XPath
+ <literal>/Zthes/termName</literal>
+ which embodies the knowledge that taxon names are specified in a
+ <literal><termName></literal> element inside the top-level
+ <literal><Zthes></literal> element.
</para>
- </sect1>
-</chapter>
-
-<!--
-
+ <para>
+ This is bad not just because it requires a lot of typing, but more
+ significantly because it ties searching semantics to the physical
+ structure of the searched records. You can't use the same search
+ specification to search two databases if their internal
+ representations are different. Consider an alternative taxonomy
+ database in which the records have taxon names specified
+ inside a <literal><name></literal> element nested within a
+ <literal><identification></literal> element
+ inside a top-level <literal><taxon></literal> element: then
+ you'd need to search for them using
+ <literal>1=/taxon/identification/name</literal>
+ </para>
+ <para>
+ How, then, can we build broadcasting Information Retrieval
+ applications that look for records in many different databases?
+ The Z39.50 protocol offers a powerful and general solution to this:
+ abstract ``access points''. In the Z39.50 model, an access point
+ is simply a point at which searches can be directed. Nothing is
+ said about implementation: in a given database, an access point
+ might be implemented as an index, a path into physical records, an
+ algorithm for interrogating relational tables or whatever works.
+ The key point is that the semantics of an access point are fixed
+ and well defined.
+ </para>
+ <para>
+ For convenience, access points are gathered into <firstterm>attribute
+ sets</firstterm>. For example, the BIB-1 attribute set is supposed to
+ contain bibliographic access points such as author, title, subject
+ and ISBN; the GEO attribute set contains access points pertaining
+ to geospatial information (bounding coordinates, stratum, latitude
+ resolution, etc.); the CIMI
+ attribute set contains access points to do with museum collections
+ (provenance, inscriptions, etc.)
+ </para>
+ <para>
+ In practice, the BIB-1 attribute set has tended to be a dumping
+ ground for all sorts of access points, so that, for example, it
+ includes some geospatial access points as well as strictly
+ bibliographic ones. Nevertheless, the key point is that this model
+ allows a layer of abstraction over the physical representation of
+ records in databases.
+ </para>
+ <para>
+ In the BIB-1 attribute set, a taxon name is probably best
+ interpreted as a title - that is, a phrase that identifies the item
+ in question. BIB-1 represents title searches by
+ access point 4. (See
+ <ulink url="ftp://ftp.loc.gov/pub/z3950/defs/bib1.txt"
+ >The BIB-1 Attribute Set Semantics</ulink>)
+ So we need to configure our dinosaur database so that searches for
+ BIB-1 access point 4 look in the
+ <literal><termName></literal> element,
+ inside the top-level
+ <literal><Zthes></literal> element.
+ </para>
+ <para>
+ This is a two-step process. First, we need to tell Zebra that we
+ want to support the BIB-1 attribute set. Then we need to tell it
+ which elements of its record pertain to access point 4.
+ </para>
+ <para>
+ We need to create an <link linkend="abs-file">Abstract Syntax
+ file</link> named after the document element of the records we're
+ working with, plus a <literal>.abs</literal> suffix - in this case,
+ <literal>Zthes.abs</literal> - as follows:
+ </para>
+ <itemizedlist>
<listitem>
<para>
- The master configuration file, <literal>zebra.cfg</literal>,
- which is as short and simple as it can be:
- <screen>
- # $Header: /home/cvsroot/idis/doc/examples.xml,v 1.7 2002-10-08 08:09:43 mike Exp $
- # Bare-bones master configuration file for Zebra
- profilePath: .:../../tab:../../../yaz/tab
- </screen>
- Apart from the comments, which are ignored, all this specifies is
- that the server should recognise the attribute set described in
- the file called
- <literal>bib1.att</literal>.
- ### What is an attribute set?
+
</para>
</listitem>
-
<listitem>
<para>
- The BIB-1 attribute set configuration file,
- <literal>bib1.att</literal>, which is also as short as possible:
- <screen>
- # $Header: /home/cvsroot/idis/doc/examples.xml,v 1.7 2002-10-08 08:09:43 mike Exp $
- # Bare-bones BIB-1 attribute set file for Zebra
- reference Bib-1
- </screen>
- Apart from the comments, all this specifies is that reference of
- the attribute set described by this file is
- <literal>Bib-1</literal>, a name recognised by the system as
- referring to a well-known opaque identifier that is transmitted
- by clients as part of their searches.
- ### Yeuch! Surely we can say that better!
- </para>
- <para>
- ### Can't we somehow say this trivial thing in the main
- configuration file?
</para>
</listitem>
--->
+ </itemizedlist>
+ </sect1>
+</chapter>
+
<!--
The simplest hello-world example could go like this:
</caption>
</mediaobject>
-Whene the three <*object> thingies inside the top-level <mediaobject>
+Where the three <*object> thingies inside the top-level <mediaobject>
are decreasingly preferred version to include depending on what the
rendering engine can handle. I generated the EPS version of the image
by exporting a line-drawing done in TGIF, then converted that to the