<chapter id="examples">
- <!-- $Id: examples.xml,v 1.15 2002-10-30 14:45:42 adam Exp $ -->
+ <!-- $Id: examples.xml,v 1.16 2002-11-08 01:01:38 mike Exp $ -->
<title>Example Configurations</title>
<sect1>
expressions to specify access points.
</para>
<para>
- Go to the <literal>examples/dinosauricon</literal> subdirectory
+ Go to the <literal>examples/zthes</literal> subdirectory
of the distribution archive.
- There you will find a <literal>records</literal> subdirectory,
- which contains some raw XML data to be added to the database: in
- this case, as single file, <literal>genera.xml</literal>,
- which contain information about all the known dinosaur genera as of
- August 2002.
+ There you will find a <literal>Makefile</literal> that will
+ populate the <literal>records</literal> subdirectory with a file of
+ <ulink url="http://zthes.z3950.org/">Zthes</ulink>
+ records representing a taxonomic hierarchy of dinosaurs. (The
+ records are generated from the family tree in the file
+ <literal>dino.tree</literal>.)
+ Type <literal>make records/dino.xml</literal>
+ to make the XML data file.
</para>
<para>
- Now we need to create the Zebra database, which we do with the
+ Now we need to create a Zebra database to hold and index the XML
+ records. We do this with the
Zebra indexer, <literal>zebraidx</literal>, which is
driven by the <literal>zebra.cfg</literal> configuration file.
For our purposes, we don't need any
<screen>
$ yaz-client tcp:@:9999
Connecting...Ok.
- Z> find @attr 1=/GENUS/SPECIES/AUTHOR/@name Wedel
+ Z> find @attr 1=/Zthes/termName Sauroposeidon
Number of hits: 1
Z> format xml
Z> show 1
- <GENUS name="Sauroposeidon" type="with">
- <MEANING>lizard Poseidon <LOW>(Greek god of, among other things, earthquakes)</LOW></MEANING>
- <SPECIES name="proteles">
- <AUTHOR type="vide" name="Franklin" year="2000"></AUTHOR>
- <AUTHOR name="Wedel, Cifelli, Sanders"></AUTHOR>
- </SPECIES>
- <PLACE name="Oklahoma"></PLACE>
- <TIME value="Albian"></TIME>
- <LENGTH value="30" q="1"></LENGTH>
- <REMAINS content="rib, cervical vertebrae"></REMAINS>
- <ESSAY>
- <P> This new <NOMEN name="Brachiosaurus"></NOMEN>-like <LINK content="dinosaur"></LINK>
- was perhaps the tallest. With its head raised, it stood 60 feet (nearly
- 20 m) tall. </P>
- </ESSAY>
+ <Zthes>
+ <termId>22</termId>
+ <termName>Sauroposeidon</termName>
+ <termType>PT</termType>
+ <relation>
+ <relationType>BT</relationType>
+ <termId>21</termId>
+ <termName>Brachiosauridae</termName>
+ <termType>PT</termType>
+ </relation>
+
<idzebra xmlns="http://www.indexdata.dk/zebra/">
- <size>593</size>
- <localnumber>891</localnumber>
- <filename>records/genera.xml</filename>
- </idzebra>
- </GENUS>
+ <size>245</size>
+ <localnumber>23</localnumber>
+ <filename>records/dino.xml</filename>
+ </idzebra>
+ </Zthes>
</screen>
</para>
<para>
<para>
The problem with the previous example is that you need to know the
structure of the documents in order to find them. For example,
- when we wanted to know the genera for which Matt Wedel is an
- author
- (<foreignphrase role="taxon">Sauroposeidon proteles</foreignphrase>),
+ when we wanted to find the record for the taxon
+ <foreignphrase role="taxon">Sauroposeidon</foreignphrase>,
we had to formulate a complex XPath
- <literal>1=/GENUS/SPECIES/AUTHOR/@name</literal>
- which embodies the knowledge that author names are specified in the
- <literal>name</literal> attribute of the
- <literal><AUTHOR></literal> element,
- which is inside the
- <literal><SPECIES></literal> element,
- which in turn is inside the top-level
- <literal><GENUS></literal> element.
+ <literal>/Zthes/termName</literal>
+ which embodies the knowledge that taxon names are specified in a
+ <literal><termName></literal> element inside the top-level
+ <literal><Zthes></literal> element.
</para>
<para>
This is bad not just because it requires a lot of typing, but more
significantly because it ties searching semantics to the physical
structure of the searched records. You can't use the same search
specification to search two databases if their internal
- representations are different. Consider an alternative dinosaur
- database in which the records have author names specified
- inside an <literal><authorName></literal> element directly
+ representations are different. Consider an alternative taxonomy
+ database in which the records have taxon names specified
+ inside a <literal><name></literal> element nested within a
+ <literal><identification></literal> element
inside a top-level <literal><taxon></literal> element: then
you'd need to search for them using
- <literal>1=/taxon/authorName</literal>
+ <literal>1=/taxon/identification/name</literal>
</para>
<para>
How, then, can we build broadcasting Information Retrieval
records in databases.
</para>
<para>
- In the BIB-1 attribute set, an author search is represented by
- access point 1003. (See
+ In the BIB-1 attribute set, a taxon name is probably best
+ interpreted as a title - that is, a phrase that identifies the item
+ in question. BIB-1 represents title searches by
+ access point 4. (See
<ulink url="###bib1-semantics"/>)
So we need to configure our dinosaur database so that searches for
- BIB-1 access point 1003 look the
- <literal>name</literal> attribute of the
- <literal><AUTHOR></literal> element,
- inside the
- <literal><SPECIES></literal> element,
+ BIB-1 access point 4 look in the
+ <literal><termName></literal> element,
inside the top-level
- <literal><GENUS></literal> element.
+ <literal><Zthes></literal> element.
</para>
<para>
This is a two-step process. First, we need to tell Zebra that we
want to support the BIB-1 attribute set. Then we need to tell it
- which elements of its record pertain to access point 1003.
+ which elements of its record pertain to access point 4.
</para>
<para>
We need to create an <link linkend="abs-file">Abstract Syntax
file</link> named after the document element of the records we're
working with, plus a <literal>.abs</literal> suffix - in this case,
- <literal>GENUS.abs</literal> - as follows:
+ <literal>Zthes.abs</literal> - as follows:
</para>
<itemizedlist>
<listitem>