<chapter id="administration">
- <!-- $Id: administration.xml,v 1.50 2007-02-02 11:10:08 marc Exp $ -->
+ <!-- $Id: administration.xml,v 1.51 2007-05-24 13:44:09 adam Exp $ -->
<title>Administrating &zebra;</title>
<!-- ### It's a bit daft that this chapter (which describes half of
the configuration-file formats) is separated from
</para>
<para>
- Both the &zebra; administrative tool and the &z3950; server share a
+ Both the &zebra; administrative tool and the &acro.z3950; server share a
set of index files and a global configuration file.
The name of the configuration file defaults to
<literal>zebra.cfg</literal>.
In the configuration file, the group name is placed before the option
name itself, separated by a dot (.). For instance, to set the record type
for group <literal>public</literal> to <literal>grs.sgml</literal>
- (the &sgml;-like format for structured records) you would write:
+ (the &acro.sgml;-like format for structured records) you would write:
</para>
<para>
<replaceable>database</replaceable></term>
<listitem>
<para>
- Specifies the &z3950; database name.
+ Specifies the &acro.z3950; database name.
<!-- FIXME - now we can have multiple databases in one server. -H -->
</para>
</listitem>
of permissions currently: read (r) and write(w). By default
users not listed in a permission directive are given the read
privilege. To specify permissions for a user with no
- username, or &z3950; anonymous style use
+ username, or &acro.z3950; anonymous style use
<literal>anonymous</literal>. The permstring consists of
a sequence of characters. Include character <literal>w</literal>
for write/update access, <literal>r</literal> for read access and
mounted on a CD-ROM drive,
you may want &zebra; to make an internal copy of them. To do this,
you specify 1 (true) in the <literal>storeData</literal> setting. When
- the &z3950; server retrieves the records they will be read from the
+ the &acro.z3950; server retrieves the records they will be read from the
internal file structures of the system.
</para>
<para>
Consider a system in which you have a group of text files called
<literal>simple</literal>.
- That group of records should belong to a &z3950; database called
+ That group of records should belong to a &acro.z3950; database called
<literal>textbase</literal>.
The following <literal>zebra.cfg</literal> file will suffice:
</para>
information. If you have a group of records that explicitly associates
an ID with each record, this method is convenient. For example, the
record format may contain a title or a ID-number - unique within the group.
- In either case you specify the &z3950; attribute set and use-attribute
+ In either case you specify the &acro.z3950; attribute set and use-attribute
location in which this information is stored, and the system looks at
that field to determine the identity of the record.
</para>
<para>
For instance, the sample GILS records that come with the &zebra;
distribution contain a unique ID in the data tagged Control-Identifier.
- The data is mapped to the &bib1; use attribute Identifier-standard
+ The data is mapped to the &acro.bib1; use attribute Identifier-standard
(code 1007). To use this field as a record id, specify
<literal>(bib1,Identifier-standard)</literal> as the value of the
<literal>recordId</literal> in the configuration file.
</para>
<para>
The experimental <literal>alvis</literal> filter provides a
- directive to fetch static rank information out of the indexed &xml;
+ directive to fetch static rank information out of the indexed &acro.xml;
records, thus making <emphasis>all</emphasis> hit sets ordered
after <emphasis>ascending</emphasis> static
rank, and for those doc's which have the same static rank, ordered
indexing time (this is why we
call it ``dynamic ranking'' in the first place ...)
It is invoked by adding
- the &bib1; relation attribute with
- value ``relevance'' to the &pqf; query (that is,
+ the &acro.bib1; relation attribute with
+ value ``relevance'' to the &acro.pqf; query (that is,
<literal>@attr 2=102</literal>, see also
<ulink url="&url.z39.50;bib1.html">
- The &bib1; Attribute Set Semantics</ulink>, also in
+ The &acro.bib1; Attribute Set Semantics</ulink>, also in
<ulink url="&url.z39.50.attset.bib1;">HTML</ulink>).
To find all articles with the word <literal>Eoraptor</literal> in
- the title, and present them relevance ranked, issue the &pqf; query:
+ the title, and present them relevance ranked, issue the &acro.pqf; query:
<screen>
@attr 2=102 @attr 1=4 Eoraptor
</screen>
</para>
<sect3 id="administration-ranking-dynamic-rank1">
- <title>Dynamically ranking using &pqf; queries with the 'rank-1'
+ <title>Dynamically ranking using &acro.pqf; queries with the 'rank-1'
algorithm</title>
<para>
<term>Query Components</term>
<listitem>
<para>
- First, the boolean query is dismantled into it's principal components,
+ First, the boolean query is dismantled into its principal components,
i.e. atomic queries where one term is looked up in one index.
For example, the query
<screen>
</para>
<para>
It is possible to apply dynamic ranking on only parts of the
- &pqf; query:
+ &acro.pqf; query:
<screen>
@and @attr 2=102 @attr 1=1010 Utah @attr 1=1018 Springer
</screen>
</para>
<para>
Ranking weights may be used to pass a value to a ranking
- algorithm, using the non-standard &bib1; attribute type 9.
+ algorithm, using the non-standard &acro.bib1; attribute type 9.
This allows one branch of a query to use one value while
another branch uses a different one. For example, we can search
for <literal>utah</literal> in the
</para>
<para>
The default weight is
- sqrt(1000) ~ 34 , as the &z3950; standard prescribes that the top score
+ sqrt(1000) ~ 34 , as the &acro.z3950; standard prescribes that the top score
is 1000 and the bottom score is 0, encoded in integers.
</para>
<warning>
<!--
<sect3 id="administration-ranking-dynamic-rank1">
- <title>Dynamically ranking &pqf; queries with the 'rank-static'
+ <title>Dynamically ranking &acro.pqf; queries with the 'rank-static'
algorithm</title>
<para>
The dummy <literal>rank-static</literal> reranking/scoring
</sect3>
<sect3 id="administration-ranking-dynamic-cql">
- <title>Dynamically ranking &cql; queries</title>
+ <title>Dynamically ranking &acro.cql; queries</title>
<para>
- Dynamic ranking can be enabled during sever side &cql;
+ Dynamic ranking can be enabled during sever side &acro.cql;
query expansion by adding <literal>@attr 2=102</literal>
- chunks to the &cql; config file. For example
+ chunks to the &acro.cql; config file. For example
<screen>
relationModifier.relevant = 2=102
</screen>
- invokes dynamic ranking each time a &cql; query of the form
+ invokes dynamic ranking each time a &acro.cql; query of the form
<screen>
Z> querytype cql
Z> f alvis.text =/relevant house
</screen>
is issued. Dynamic ranking can also be automatically used on
- specific &cql; indexes by (for example) setting
+ specific &acro.cql; indexes by (for example) setting
<screen>
index.alvis.text = 1=text 2=102
</screen>
- which then invokes dynamic ranking each time a &cql; query of the form
+ which then invokes dynamic ranking each time a &acro.cql; query of the form
<screen>
Z> querytype cql
Z> f alvis.text = house
&zebra; sorts efficiently using special sorting indexes
(type=<literal>s</literal>; so each sortable index must be known
at indexing time, specified in the configuration of record
- indexing. For example, to enable sorting according to the &bib1;
+ indexing. For example, to enable sorting according to the &acro.bib1;
<literal>Date/time-added-to-db</literal> field, one could add the line
<screen>
xelm /*/@created Date/time-added-to-db:s
<para>
Indexing can be specified at searching time using a query term
carrying the non-standard
- &bib1; attribute-type <literal>7</literal>. This removes the
- need to send a &z3950; <literal>Sort Request</literal>
+ &acro.bib1; attribute-type <literal>7</literal>. This removes the
+ need to send a &acro.z3950; <literal>Sort Request</literal>
separately, and can dramatically improve latency when the client
and server are on separate networks.
The sorting part of the query is separate from the rest of the
</para>
<para>
A sorting subquery needs two attributes: an index (such as a
- &bib1; type-1 attribute) specifying which index to sort on, and a
+ &acro.bib1; type-1 attribute) specifying which index to sort on, and a
type-7 attribute whose value is be <literal>1</literal> for
ascending sorting, or <literal>2</literal> for descending. The
term associated with the sorting attribute is the priority of
on.
</para>
<para>For example, a search for water, sort by title (ascending),
- is expressed by the &pqf; query
+ is expressed by the &acro.pqf; query
<screen>
@or @attr 1=1016 water @attr 7=1 @attr 1=4 0
</screen>
<note>
<para>
Extended services are only supported when accessing the &zebra;
- server using the <ulink url="&url.z39.50;">&z3950;</ulink>
- protocol. The <ulink url="&url.sru;">&sru;</ulink> protocol does
+ server using the <ulink url="&url.z39.50;">&acro.z3950;</ulink>
+ protocol. The <ulink url="&url.sru;">&acro.sru;</ulink> protocol does
not support extended services.
</para>
</note>
storeKeys: 1
</screen>
The general record type should be set to any record filter which
- is able to parse &xml; records, you may use any of the two
+ is able to parse &acro.xml; records, you may use any of the two
declarations (but not both simultaneously!)
<screen>
recordType: grs.xml
<para>
It is not possible to carry information about record types or
similar to &zebra; when using extended services, due to
- limitations of the <ulink url="&url.z39.50;">&z3950;</ulink>
+ limitations of the <ulink url="&url.z39.50;">&acro.z3950;</ulink>
protocol. Therefore, indexing filters can not be chosen on a
- per-record basis. One and only one general &xml; indexing filter
+ per-record basis. One and only one general &acro.xml; indexing filter
must be defined.
<!-- but because it is represented as an OID, we would need some
form of proprietary mapping scheme between record type strings and
OIDs. -->
<!--
However, as a minimum, it would be extremely useful to enable
- people to use &marc21;, assuming grs.marcxml.marc21 as a record
+ people to use &acro.marc21;, assuming grs.marcxml.marc21 as a record
type.
-->
</para>
<sect2 id="administration-extended-services-z3950">
- <title>Extended services in the &z3950; protocol</title>
+ <title>Extended services in the &acro.z3950; protocol</title>
<para>
- The <ulink url="&url.z39.50;">&z3950;</ulink> standard allows
+ The <ulink url="&url.z39.50;">&acro.z3950;</ulink> standard allows
servers to accept special binary <emphasis>extended services</emphasis>
protocol packages, which may be used to insert, update and delete
records into servers. These carry control and update
</para>
<table id="administration-extended-services-z3950-table" frame="top">
- <title>Extended services &z3950; Package Fields</title>
+ <title>Extended services &acro.z3950; Package Fields</title>
<tgroup cols="3">
<thead>
<row>
</row>
<row>
<entry><literal>record</literal></entry>
- <entry><literal>&xml; string</literal></entry>
- <entry>An &xml; formatted string containing the record</entry>
+ <entry><literal>&acro.xml; string</literal></entry>
+ <entry>An &acro.xml; formatted string containing the record</entry>
</row>
<row>
<entry><literal>syntax</literal></entry>
<entry><literal>'xml'</literal></entry>
- <entry>Only &xml; record syntax is supported</entry>
+ <entry>Only &acro.xml; record syntax is supported</entry>
</row>
<row>
<entry><literal>recordIdOpaque</literal></entry>
<para>
When retrieving existing
- records indexed with &grs1; indexing filters, the &zebra; internal
+ records indexed with &acro.grs1; indexing filters, the &zebra; internal
ID number is returned in the field
<literal>/*/id:idzebra/localnumber</literal> in the namespace
<literal>xmlns:id="http://www.indexdata.dk/zebra/"</literal>,
]]>
</screen>
Now the <literal>Default</literal> database was created,
- we can insert an &xml; file (esdd0006.grs
+ we can insert an &acro.xml; file (esdd0006.grs
from example/gils/records) and index it:
<screen>
<![CDATA[
<title>Extended services from yaz-php</title>
<para>
- Extended services are also available from the &yaz; &php; client layer. An
- example of an &yaz;-&php; extended service transaction is given here:
+ Extended services are also available from the &yaz; &acro.php; client layer. An
+ example of an &yaz;-&acro.php; extended service transaction is given here:
<screen>
<![CDATA[
$record = '<record><title>A fine specimen of a record</title></record>';
<chapter id="architecture">
- <!-- $Id: architecture.xml,v 1.21 2007-02-20 14:28:31 marc Exp $ -->
+ <!-- $Id: architecture.xml,v 1.22 2007-05-24 13:44:09 adam Exp $ -->
<title>Overview of &zebra; Architecture</title>
<section id="architecture-representation">
<varlistentry>
<term>Search Evaluation</term>
<listitem>
- <para>by execution of search requests expressed in &pqf;/&rpn;
+ <para>by execution of search requests expressed in &acro.pqf;/&acro.rpn;
data structures, which are handed over from
- the &yaz; server frontend &api;. Search evaluation includes
+ the &yaz; server frontend &acro.api;. Search evaluation includes
construction of hit lists according to boolean combinations
of simpler searches. Fast performance is achieved by careful
use of index structures, and by evaluation specific index hit
<term>Record Presentation</term>
<listitem>
<para>returns - possibly ranked - result sets, hit
- numbers, and the like internal data to the &yaz; server backend &api;
+ numbers, and the like internal data to the &yaz; server backend &acro.api;
for shipping to the client. Each individual filter module
implements it's own specific presentation formats.
</para>
<section id="componentsearcher">
<title>&zebra; Searcher/Retriever</title>
<para>
- This is the executable which runs the &z3950;/&sru;/&srw; server and
+ This is the executable which runs the &acro.z3950;/&acro.sru;/&acro.srw; server and
glues together the core libraries and the filter modules to one
great Information Retrieval server application.
</para>
<title>&yaz; Server Frontend</title>
<para>
The &yaz; server frontend is
- a full fledged stateful &z3950; server taking client
+ a full fledged stateful &acro.z3950; server taking client
connections, and forwarding search and scan requests to the
&zebra; core indexer.
</para>
<para>
- In addition to &z3950; requests, the &yaz; server frontend acts
+ In addition to &acro.z3950; requests, the &yaz; server frontend acts
as HTTP server, honoring
- <ulink url="&url.srw;">&sru; &soap;</ulink>
+ <ulink url="&url.srw;">&acro.sru; &acro.soap;</ulink>
requests, and
- <ulink url="&url.sru;">&sru; &rest;</ulink>
+ <ulink url="&url.sru;">&acro.sru; &acro.rest;</ulink>
requests. Moreover, it can
translate incoming
- <ulink url="&url.cql;">&cql;</ulink>
+ <ulink url="&url.cql;">&acro.cql;</ulink>
queries to
- <ulink url="&url.yaz.pqf;">&pqf;</ulink>
+ <ulink url="&url.yaz.pqf;">&acro.pqf;</ulink>
queries, if
correctly configured.
</para>
<ulink url="&url.yaz;">&yaz;</ulink>
is an Open Source
toolkit that allows you to develop software using the
- &ansi; &z3950;/ISO23950 standard for information retrieval.
+ &acro.ansi; &acro.z3950;/ISO23950 standard for information retrieval.
It is packaged in the Debian packages
<literal>yaz</literal> and <literal>libyaz</literal>.
</para>
</para>
<section id="componentmodulesdom">
- <title>&dom; &xml; Record Model and Filter Module</title>
+ <title>&acro.dom; &acro.xml; Record Model and Filter Module</title>
<para>
- The &dom; &xml; filter uses a standard &dom; &xml; structure as
+ The &acro.dom; &acro.xml; filter uses a standard &acro.dom; &acro.xml; structure as
internal data model, and can thus parse, index, and display
- any &xml; document.
+ any &acro.xml; document.
</para>
<para>
- A parser for binary &marc; records based on the ISO2709 library
+ A parser for binary &acro.marc; records based on the ISO2709 library
standard is provided, it transforms these to the internal
- &marcxml; &dom; representation.
+ &acro.marcxml; &acro.dom; representation.
</para>
<para>
- The internal &dom; &xml; representation can be fed into four
+ The internal &acro.dom; &acro.xml; representation can be fed into four
different pipelines, consisting of arbitraily many sucessive
- &xslt; transformations; these are for
+ &acro.xslt; transformations; these are for
<itemizedlist>
<listitem><para>input parsing and initial
transformations,</para></listitem>
</itemizedlist>
</para>
<para>
- The &dom; &xml; filter pipelines use &xslt; (and if supported on
- your platform, even &exslt;), it brings thus full &xpath;
+ The &acro.dom; &acro.xml; filter pipelines use &acro.xslt; (and if supported on
+ your platform, even &acro.exslt;), it brings thus full &acro.xpath;
support to the indexing, storage and display rules of not only
- &xml; documents, but also binary &marc; records.
+ &acro.xml; documents, but also binary &acro.marc; records.
</para>
<para>
- Finally, the &dom; &xml; filter allows for static ranking at index
+ Finally, the &acro.dom; &acro.xml; filter allows for static ranking at index
time, and to to sort hit lists according to predefined
static ranks.
</para>
<para>
- Details on the experimental &dom; &xml; filter are found in
+ Details on the experimental &acro.dom; &acro.xml; filter are found in
<xref linkend="record-model-domxml"/>.
</para>
<para>
The Debian package <literal>libidzebra-2.0-mod-dom</literal>
- contains the &dom; filter module.
+ contains the &acro.dom; filter module.
</para>
</section>
<section id="componentmodulesalvis">
- <title>ALVIS &xml; Record Model and Filter Module</title>
+ <title>ALVIS &acro.xml; Record Model and Filter Module</title>
<note>
<para>
The functionality of this record model has been improved and
- replaced by the &dom; &xml; record model. See
+ replaced by the &acro.dom; &acro.xml; record model. See
<xref linkend="componentmodulesdom"/>.
</para>
</note>
<para>
- The Alvis filter for &xml; files is an &xslt; based input
+ The Alvis filter for &acro.xml; files is an &acro.xslt; based input
filter.
- It indexes element and attribute content of any thinkable &xml; format
- using full &xpath; support, a feature which the standard &zebra;
- &grs1; &sgml; and &xml; filters lacked. The indexed documents are
- parsed into a standard &xml; &dom; tree, which restricts record size
+ It indexes element and attribute content of any thinkable &acro.xml; format
+ using full &acro.xpath; support, a feature which the standard &zebra;
+ &acro.grs1; &acro.sgml; and &acro.xml; filters lacked. The indexed documents are
+ parsed into a standard &acro.xml; &acro.dom; tree, which restricts record size
according to availability of memory.
</para>
<para>
The Alvis filter
- uses &xslt; display stylesheets, which let
+ uses &acro.xslt; display stylesheets, which let
the &zebra; DB administrator associate multiple, different views on
- the same &xml; document type. These views are chosen on-the-fly in
+ the same &acro.xml; document type. These views are chosen on-the-fly in
search time.
</para>
<para>
In addition, the Alvis filter configuration is not bound to the
- arcane &bib1; &z3950; library catalogue indexing traditions and
+ arcane &acro.bib1; &acro.z3950; library catalogue indexing traditions and
folklore, and is therefore easier to understand.
</para>
<para>
their Pagerank algorithm.
</para>
<para>
- Details on the experimental Alvis &xslt; filter are found in
+ Details on the experimental Alvis &acro.xslt; filter are found in
<xref linkend="record-model-alvisxslt"/>.
</para>
<para>
</section>
<section id="componentmodulesgrs">
- <title>&grs1; Record Model and Filter Modules</title>
+ <title>&acro.grs1; Record Model and Filter Modules</title>
<note>
<para>
The functionality of this record model has been improved and
- replaced by the &dom; &xml; record model. See
+ replaced by the &acro.dom; &acro.xml; record model. See
<xref linkend="componentmodulesdom"/>.
</para>
</note>
<para>
- The &grs1; filter modules described in
+ The &acro.grs1; filter modules described in
<xref linkend="grs"/>
- are all based on the &z3950; specifications, and it is absolutely
- mandatory to have the reference pages on &bib1; attribute sets on
- you hand when configuring &grs1; filters. The GRS filters come in
+ are all based on the &acro.z3950; specifications, and it is absolutely
+ mandatory to have the reference pages on &acro.bib1; attribute sets on
+ you hand when configuring &acro.grs1; filters. The GRS filters come in
different flavors, and a short introduction is needed here.
- &grs1; filters of various kind have also been called ABS filters due
+ &acro.grs1; filters of various kind have also been called ABS filters due
to the <filename>*.abs</filename> configuration file suffix.
</para>
<para>
The <emphasis>grs.marc</emphasis> and
<emphasis>grs.marcxml</emphasis> filters are suited to parse and
- index binary and &xml; versions of traditional library &marc; records
+ index binary and &acro.xml; versions of traditional library &acro.marc; records
based on the ISO2709 standard. The Debian package for both
filters is
<literal>libidzebra-2.0-mod-grs-marc</literal>.
</para>
<para>
- &grs1; TCL scriptable filters for extensive user configuration come
+ &acro.grs1; TCL scriptable filters for extensive user configuration come
in two flavors: a regular expression filter
<emphasis>grs.regx</emphasis> using TCL regular expressions, and
a general scriptable TCL filter called
<literal>libidzebra-2.0-mod-grs-regx</literal> Debian package.
</para>
<para>
- A general purpose &sgml; filter is called
+ A general purpose &acro.sgml; filter is called
<emphasis>grs.sgml</emphasis>. This filter is not yet packaged,
but planned to be in the
<literal>libidzebra-2.0-mod-grs-sgml</literal> Debian package.
<literal>libidzebra-2.0-mod-grs-xml</literal> includes the
<emphasis>grs.xml</emphasis> filter which uses <ulink
url="&url.expat;">Expat</ulink> to
- parse records in &xml; and turn them into ID&zebra;'s internal &grs1; node
- trees. Have also a look at the Alvis &xml;/&xslt; filter described in
+ parse records in &acro.xml; and turn them into ID&zebra;'s internal &acro.grs1; node
+ trees. Have also a look at the Alvis &acro.xml;/&acro.xslt; filter described in
the next session.
</para>
</section>
<para>
When records are accessed by the system, they are represented
- in their local, or native format. This might be &sgml; or HTML files,
- News or Mail archives, &marc; records. If the system doesn't already
+ in their local, or native format. This might be &acro.sgml; or HTML files,
+ News or Mail archives, &acro.marc; records. If the system doesn't already
know how to read the type of data you need to store, you can set up an
input filter by preparing conversion rules based on regular
expressions and possibly augmented by a flexible scripting language
<para>
Before transmitting records to the client, they are first
converted from the internal structure to a form suitable for exchange
- over the network - according to the &z3950; standard.
+ over the network - according to the &acro.z3950; standard.
</para>
</listitem>
In particular, the regular record filters are not invoked when
these are in use.
This can in some cases make the retrival faster than regular
- retrieval operations (for &marc;, &xml; etc).
+ retrieval operations (for &acro.marc;, &acro.xml; etc).
</para>
<table id="special-retrieval-types">
<title>Special Retrieval Elements</title>
<row>
<entry><literal>zebra::meta::sysno</literal></entry>
<entry>Get &zebra; record system ID</entry>
- <entry>&xml; and &sutrs;</entry>
+ <entry>&acro.xml; and &acro.sutrs;</entry>
</row>
<row>
<entry><literal>zebra::data</literal></entry>
<row>
<entry><literal>zebra::meta</literal></entry>
<entry>Get &zebra; record internal metadata</entry>
- <entry>&xml; and &sutrs;</entry>
+ <entry>&acro.xml; and &acro.sutrs;</entry>
</row>
<row>
<entry><literal>zebra::index</literal></entry>
<entry>Get all indexed keys for record</entry>
- <entry>&xml; and &sutrs;</entry>
+ <entry>&acro.xml; and &acro.sutrs;</entry>
</row>
<row>
<entry>
<entry>
Get indexed keys for field <replaceable>f</replaceable> for record
</entry>
- <entry>&xml; and &sutrs;</entry>
+ <entry>&acro.xml; and &acro.sutrs;</entry>
</row>
<row>
<entry>
Get indexed keys for field <replaceable>f</replaceable>
and type <replaceable>t</replaceable> for record
</entry>
- <entry>&xml; and &sutrs;</entry>
+ <entry>&acro.xml; and &acro.sutrs;</entry>
</row>
</tbody>
</tgroup>
Z> elements zebra::meta::sysno
Z> s 1+1
</screen>
- displays in <literal>&xml;</literal> record syntax only internal
+ displays in <literal>&acro.xml;</literal> record syntax only internal
record system number, whereas
<screen>
Z> f @attr 1=title my
Z> s 1+1
</screen>
will display all indexed tokens from all indexed fields of the
- first record, and it will display in <literal>&sutrs;</literal>
+ first record, and it will display in <literal>&acro.sutrs;</literal>
record syntax, whereas
<screen>
Z> f @attr 1=title my
Z> elements zebra::index::title:p
Z> s 1+1
</screen>
- displays in <literal>&xml;</literal> record syntax only the content
+ displays in <literal>&acro.xml;</literal> record syntax only the content
of the zebra string index <literal>title</literal>, or
even only the type <literal>p</literal> phrase indexed part of it.
</para>
<note>
<para>
- Trying to access numeric <literal>&bib1;</literal> use
+ Trying to access numeric <literal>&acro.bib1;</literal> use
attributes or trying to access non-existent zebra intern string
access points will result in a Diagnostic 25: Specified element set
'name not valid for specified database.
<chapter id="examples">
- <!-- $Id: examples.xml,v 1.26 2007-02-02 11:10:08 marc Exp $ -->
+ <!-- $Id: examples.xml,v 1.27 2007-05-24 13:44:09 adam Exp $ -->
<title>Example Configurations</title>
<sect1 id="examples-overview">
</sect1>
<sect1 id="example1">
- <title>Example 1: &xml; Indexing And Searching</title>
+ <title>Example 1: &acro.xml; Indexing And Searching</title>
<para>
This example shows how &zebra; can be used with absolutely minimal
configuration to index a body of
- <ulink url="&url.xml;">&xml;</ulink>
+ <ulink url="&url.xml;">&acro.xml;</ulink>
documents, and search them using
<ulink url="&url.xpath;">XPath</ulink>
expressions to specify access points.
records are generated from the family tree in the file
<literal>dino.tree</literal>.)
Type <literal>make records/dino.xml</literal>
- to make the &xml; data file.
- (Or you could just type <literal>make dino</literal> to build the &xml;
+ to make the &acro.xml; data file.
+ (Or you could just type <literal>make dino</literal> to build the &acro.xml;
data file, create the database and populate it with the taxonomic
records all in one shot - but then you wouldn't learn anything,
would you? :-)
</para>
<para>
- Now we need to create a &zebra; database to hold and index the &xml;
+ Now we need to create a &zebra; database to hold and index the &acro.xml;
records. We do this with the
&zebra; indexer, <command>zebraidx</command>, which is
driven by the <literal>zebra.cfg</literal> configuration file.
</para>
<para>
That's all you need for a minimal &zebra; configuration. Now you can
- roll the &xml; records into the database and build the indexes:
+ roll the &acro.xml; records into the database and build the indexes:
<screen>
zebraidx update records
</screen>
<xref linkend="zebrasrv"/>.
</para>
<para>
- Now you can use the &z3950; client program of your choice to execute
- XPath-based boolean queries and fetch the &xml; records that satisfy
+ Now you can use the &acro.z3950; client program of your choice to execute
+ XPath-based boolean queries and fetch the &acro.xml; records that satisfy
them:
<screen>
$ yaz-client @:9999
<para>
How, then, can we build broadcasting Information Retrieval
applications that look for records in many different databases?
- The &z3950; protocol offers a powerful and general solution to this:
- abstract ``access points''. In the &z3950; model, an access point
+ The &acro.z3950; protocol offers a powerful and general solution to this:
+ abstract ``access points''. In the &acro.z3950; model, an access point
is simply a point at which searches can be directed. Nothing is
said about implementation: in a given database, an access point
might be implemented as an index, a path into physical records, an
</para>
<para>
For convenience, access points are gathered into <firstterm>attribute
- sets</firstterm>. For example, the &bib1; attribute set is supposed to
+ sets</firstterm>. For example, the &acro.bib1; attribute set is supposed to
contain bibliographic access points such as author, title, subject
and ISBN; the GEO attribute set contains access points pertaining
to geospatial information (bounding coordinates, stratum, latitude
(provenance, inscriptions, etc.)
</para>
<para>
- In practice, the &bib1; attribute set has tended to be a dumping
+ In practice, the &acro.bib1; attribute set has tended to be a dumping
ground for all sorts of access points, so that, for example, it
includes some geospatial access points as well as strictly
bibliographic ones. Nevertheless, this model
records in databases.
</para>
<para>
- In the &bib1; attribute set, a taxon name is probably best
+ In the &acro.bib1; attribute set, a taxon name is probably best
interpreted as a title - that is, a phrase that identifies the item
- in question. &bib1; represents title searches by
+ in question. &acro.bib1; represents title searches by
access point 4. (See
- <ulink url="&url.z39.50.bib1.semantics;">The &bib1; Attribute
+ <ulink url="&url.z39.50.bib1.semantics;">The &acro.bib1; Attribute
Set Semantics</ulink>)
So we need to configure our dinosaur database so that searches for
- &bib1; access point 4 look in the
+ &acro.bib1; access point 4 look in the
<literal><termName></literal> element,
inside the top-level
<literal><Zthes></literal> element.
</para>
<para>
This is a two-step process. First, we need to tell &zebra; that we
- want to support the &bib1; attribute set. Then we need to tell it
+ want to support the &acro.bib1; attribute set. Then we need to tell it
which elements of its record pertain to access point 4.
</para>
<para>
</callout>
<callout arearefs="attset.attset">
<para>
- Declare &bib1; attribute set. See <filename>bib1.att</filename> in
+ Declare &acro.bib1; attribute set. See <filename>bib1.att</filename> in
&zebra;'s <filename>tab</filename> directory.
</para>
</callout>
<callout arearefs="termName">
<para>
Make <literal>termName</literal> word searchable by both
- Zthes attribute termName (1002) and &bib1; atttribute title (4).
+ Zthes attribute termName (1002) and &acro.bib1; atttribute title (4).
</para>
</callout>
</calloutlist>
</programlistingco>
<para>
- After re-indexing, we can search the database using &bib1;
+ After re-indexing, we can search the database using &acro.bib1;
attribute, title, as follows:
<screen>
Z> form xml
Z> s
Sent presentRequest (1+1).
Records: 1
-[Default]Record type: &xml;
+[Default]Record type: &acro.xml;
<Zthes>
<termId>2</termId>
<termName>Eoraptor</termName>
-<!-- $Id: installation.xml,v 1.35 2007-02-02 11:10:08 marc Exp $ -->
+<!-- $Id: installation.xml,v 1.36 2007-05-24 13:44:09 adam Exp $ -->
<chapter id="installation">
<title>Installation</title>
<para>
- &zebra; is written in &ansi; C and was implemented with portability in mind.
+ &zebra; is written in &acro.ansi; C and was implemented with portability in mind.
We primarily use <ulink url="&url.gcc;">GCC</ulink> on UNIX and
<ulink url="&url.vstudio;">Microsoft Visual C++</ulink> on Windows.
</para>
(required)</term>
<listitem>
<para>
- &zebra; uses &yaz; to support <ulink url="&url.z39.50;">&z3950;</ulink> /
- <ulink url="&url.sru;">&sru;</ulink>.
+ &zebra; uses &yaz; to support <ulink url="&url.z39.50;">&acro.z3950;</ulink> /
+ <ulink url="&url.sru;">&acro.sru;</ulink>.
Also the memory management utilites from &yaz; is used by &zebra;.
</para>
</listitem>
(optional)</term>
<listitem>
<para>
- &xml; parser. If you're going to index real &xml; you should
+ &acro.xml; parser. If you're going to index real &acro.xml; you should
install this (filter grs.xml). On most systems you should be able
to find binary Expat packages.
</para>
<para>
On Unix, GCC works fine, but any native
C compiler should be possible to use as long as it is
- &ansi; C compliant.
+ &acro.ansi; C compliant.
</para>
<para>
<term><literal>zebrasrv</literal></term>
<listitem>
<para>
- The &z3950; server and search engine.
+ The &acro.z3950; server and search engine.
</para>
</listitem>
</varlistentry>
<para>
The <literal>.so</literal>-files are &zebra; record filter modules.
There are modules for reading
- &marc; (<filename>mod-grs-marc.so</filename>),
- &xml; (<filename>mod-grs-xml.so</filename>) , etc.
+ &acro.marc; (<filename>mod-grs-marc.so</filename>),
+ &acro.xml; (<filename>mod-grs-xml.so</filename>) , etc.
</para>
</listitem>
</varlistentry>
redirection to other fields.
For example the following snippet of
a custom <filename>custom/bib1.att</filename>
- &bib1; attribute set definition file is no
+ &acro.bib1; attribute set definition file is no
longer supported:
<screen>
att 1016 Any 1016,4,1005,62
</para>
<para>
Similar behaviour can be expressed in the new release by defining
- a new index <literal>Any:w</literal> in all &grs1;
+ a new index <literal>Any:w</literal> in all &acro.grs1;
<filename>*.abs</filename> record indexing configuration files.
The above example configuration needs to make the changes
from version 1.3.x indexing instructions
<screen>
att 1016 Body-of-text
</screen>
- with equivalent outcome without editing all &grs1;
+ with equivalent outcome without editing all &acro.grs1;
<filename>*.abs</filename> record indexing configuration files.
</para>
<para>
Server installations which use the special
- <literal>&idxpath;</literal> attribute set must add the following
+ <literal>&acro.idxpath;</literal> attribute set must add the following
line to the <filename>zebra.cfg</filename> configuration file:
<screen>
attset: idxpath.att
<chapter id="introduction">
- <!-- $Id: introduction.xml,v 1.49 2007-02-05 14:32:31 marc Exp $ -->
+ <!-- $Id: introduction.xml,v 1.50 2007-05-24 13:44:09 adam Exp $ -->
<title>Introduction</title>
<section id="overview">
<para>
&zebra; is a free, fast, friendly information management system. It can
- index records in &xml;/&sgml;, &marc;, e-mail archives and many other
+ index records in &acro.xml;/&acro.sgml;, &acro.marc;, e-mail archives and many other
formats, and quickly find them using a combination of boolean
searching and relevance ranking. Search-and-retrieve applications can
- be written using &api;s in a wide variety of languages, communicating
+ be written using &acro.api;s in a wide variety of languages, communicating
with the &zebra; server using industry-standard information-retrieval
protocols or web services.
</para>
</para>
<para>
&zebra; is a networked component which acts as a
- reliable &z3950; server
+ reliable &acro.z3950; server
for both record/document search, presentation, insert, update and
- delete operations. In addition, it understands the &sru; family of
- webservices, which exist in &rest; &get;/&post; and truly
- &soap; flavors.
+ delete operations. In addition, it understands the &acro.sru; family of
+ webservices, which exist in &acro.rest; &acro.get;/&acro.post; and truly
+ &acro.soap; flavors.
</para>
<para>
&zebra; is available as MS Windows 2003 Server (32 bit) self-extracting
<ulink url="http://indexdata.dk/zebra/">&zebra;</ulink>
is a high-performance, general-purpose structured text
indexing and retrieval engine. It reads records in a
- variety of input formats (eg. email, &xml;, &marc;) and provides access
+ variety of input formats (eg. email, &acro.xml;, &acro.marc;) and provides access
to them through a powerful combination of boolean search
expressions and relevance-ranked free-text queries.
</para>
&zebra; supports large databases (tens of millions of records,
tens of gigabytes of data). It allows safe, incremental
database updates on live systems. Because &zebra; supports
- the industry-standard information retrieval protocol, &z3950;,
+ the industry-standard information retrieval protocol, &acro.z3950;,
you can search &zebra; databases using an enormous variety of
programs and toolkits, both commercial and free, which understand
this protocol. Application libraries are available to allow
bespoke clients to be written in Perl, C, C++, Java, Tcl, Visual
- Basic, Python, &php; and more - see the
- <ulink url="&url.zoom;">&zoom; web site</ulink>
+ Basic, Python, &acro.php; and more - see the
+ <ulink url="&url.zoom;">&acro.zoom; web site</ulink>
for more information on some of these client toolkits.
</para>
<tbody>
<row>
<entry>Complex semi-structured Documents</entry>
- <entry>&xml; and &grs1; Documents</entry>
- <entry>Both &xml; and &grs1; documents exhibit a &dom; like internal
+ <entry>&acro.xml; and &acro.grs1; Documents</entry>
+ <entry>Both &acro.xml; and &acro.grs1; documents exhibit a &acro.dom; like internal
representation allowing for complex indexing and display rules</entry>
<entry><xref linkend="record-model-alvisxslt"/> and
<xref linkend="grs"/></entry>
</row>
<row>
<entry>Input document formats</entry>
- <entry>&xml;, &sgml;, Text, ISO2709 (&marc;)</entry>
+ <entry>&acro.xml;, &acro.sgml;, Text, ISO2709 (&acro.marc;)</entry>
<entry>
A system of input filters driven by
regular expressions allows most ASCII-based
data formats to be easily processed.
- &sgml;, &xml;, ISO2709 (&marc;), and raw text are also
+ &acro.sgml;, &acro.xml;, ISO2709 (&acro.marc;), and raw text are also
supported.</entry>
<entry><xref linkend="componentmodules"/></entry>
</row>
<tbody>
<row>
<entry>Query languages</entry>
- <entry>&cql; and &rpn;/&pqf;</entry>
- <entry>The type-1 Reverse Polish Notation (&rpn;)
- and it's textual representation Prefix Query Format (&pqf;) are
- supported. The Common Query Language (&cql;) can be configured as
- a mapping from &cql; to &rpn;/&pqf;</entry>
+ <entry>&acro.cql; and &acro.rpn;/&acro.pqf;</entry>
+ <entry>The type-1 Reverse Polish Notation (&acro.rpn;)
+ and its textual representation Prefix Query Format (&acro.pqf;) are
+ supported. The Common Query Language (&acro.cql;) can be configured as
+ a mapping from &acro.cql; to &acro.rpn;/&acro.pqf;</entry>
<entry><xref linkend="querymodel-query-languages-pqf"/> and
<xref linkend="querymodel-cql-to-pqf"/></entry>
</row>
<row>
<entry>Complex boolean query tree</entry>
- <entry>&cql; and &rpn;/&pqf;</entry>
- <entry>Both &cql; and &rpn;/&pqf; allow atomic query parts (&apt;) to
+ <entry>&acro.cql; and &acro.rpn;/&acro.pqf;</entry>
+ <entry>Both &acro.cql; and &acro.rpn;/&acro.pqf; allow atomic query parts (&acro.apt;) to
be combined into complex boolean query trees</entry>
<entry><xref linkend="querymodel-rpn-tree"/></entry>
</row>
<row>
<entry>Field search</entry>
<entry>user defined</entry>
- <entry>Atomic query parts (&apt;) are either general, or
+ <entry>Atomic query parts (&acro.apt;) are either general, or
directed at user-specified document fields
</entry>
<entry><xref linkend="querymodel-atomic-queries"/>,
<entry></entry>
</row>
<row>
- <entry>&xml; document transformations</entry>
- <entry>&xslt; based</entry>
+ <entry>&acro.xml; document transformations</entry>
+ <entry>&acro.xslt; based</entry>
<entry> Record presentation can be performed in many
- pre-defined &xml; data
- formats, where the original &xml; records are on-the-fly transformed
- through any preconfigured &xslt; transformation. It is therefore
- trivial to present records in short/full &xml; views, transforming to
- RSS, Dublin Core, or other &xml; based data formats, or transform
+ pre-defined &acro.xml; data
+ formats, where the original &acro.xml; records are on-the-fly transformed
+ through any preconfigured &acro.xslt; transformation. It is therefore
+ trivial to present records in short/full &acro.xml; views, transforming to
+ RSS, Dublin Core, or other &acro.xml; based data formats, or transform
records to XHTML snippets ready for inserting in XHTML pages.</entry>
<entry>
<xref linkend="record-model-alvisxslt-elementset"/></entry>
</row>
<row>
<entry>Binary record transformations</entry>
- <entry>&marc;, &usmarc;, &marc21; and &marcxml;</entry>
+ <entry>&acro.marc;, &acro.usmarc;, &acro.marc21; and &acro.marcxml;</entry>
<entry>post-filter record transformations</entry>
<entry></entry>
</row>
<entry>Record Syntaxes</entry>
<entry></entry>
<entry> Multiple record syntaxes
- for data retrieval: &grs1;, &sutrs;,
- &xml;, ISO2709 (&marc;), etc. Records can be mapped between
+ for data retrieval: &acro.grs1;, &acro.sutrs;,
+ &acro.xml;, ISO2709 (&acro.marc;), etc. Records can be mapped between
record syntaxes and schemas on the fly.</entry>
<entry></entry>
</row>
<entry>&zebra; internal metadata</entry>
<entry>yes</entry>
<entry> &zebra; internal document metadata can be fetched in
- &sutrs; and &xml; record syntaxes. Those are useful in client
+ &acro.sutrs; and &acro.xml; record syntaxes. Those are useful in client
applications.</entry>
<entry><xref linkend="special-retrieval"/></entry>
</row>
<entry>&zebra; internal raw record data</entry>
<entry>yes</entry>
<entry> &zebra; internal raw, binary record data can be fetched in
- &sutrs; and &xml; record syntaxes, leveraging %zebra; to a
+ &acro.sutrs; and &acro.xml; record syntaxes, leveraging %zebra; to a
binary storage system</entry>
<entry><xref linkend="special-retrieval"/></entry>
</row>
<entry>&zebra; internal record field data</entry>
<entry>yes</entry>
<entry> &zebra; internal record field data can be fetched in
- &sutrs; and &xml; record syntaxes. This makes very fast minimal
+ &acro.sutrs; and &acro.xml; record syntaxes. This makes very fast minimal
record data displays possible.</entry>
<entry><xref linkend="special-retrieval"/></entry>
</row>
</row>
<row>
<entry>Remote updates</entry>
- <entry>&z3950; extended services</entry>
+ <entry>&acro.z3950; extended services</entry>
<entry>Updates can be performed from remote locations using the
- &z3950; extended services. Access to extended services can be
+ &acro.z3950; extended services. Access to extended services can be
login-password protected.</entry>
<entry><xref linkend="administration-extended-services"/> and
<xref linkend="zebra-cfg"/></entry>
<tbody>
<row>
<entry>Fundamental operations</entry>
- <entry>&z3950;/&sru; <literal>explain</literal>,
+ <entry>&acro.z3950;/&acro.sru; <literal>explain</literal>,
<literal>search</literal>, <literal>scan</literal>, and
<literal>update</literal></entry>
<entry></entry>
<entry><xref linkend="querymodel-operation-types"/></entry>
</row>
<row>
- <entry>&z3950; protocol support</entry>
+ <entry>&acro.z3950; protocol support</entry>
<entry>yes</entry>
<entry> Protocol facilities supported are:
<literal>init</literal>, <literal>search</literal>,
<literal>delete</literal>, <literal>scan</literal>
(index browsing), <literal>sort</literal>,
<literal>close</literal> and support for the <literal>update</literal>
- Extended Service to add or replace an existing &xml;
+ Extended Service to add or replace an existing &acro.xml;
record. Piggy-backed presents are honored in the search
request. Named result sets are supported.</entry>
<entry><xref linkend="protocol-support"/></entry>
</row>
<row>
<entry>Web Service support</entry>
- <entry>&sru_gps;</entry>
+ <entry>&acro.sru;</entry>
<entry> The protocol operations <literal>explain</literal>,
<literal>searchRetrieve</literal> and <literal>scan</literal>
- are supported. <ulink url="&url.cql;">&cql;</ulink> to internal
- query model &rpn;
+ are supported. <ulink url="&url.cql;">&acro.cql;</ulink> to internal
+ query model &acro.rpn;
conversion is supported. Extended RPN queries
for search/retrieve and scan are supported.</entry>
<entry><xref linkend="zebrasrv-sru-support"/></entry>
</para>
<para>
In early 2005, the Koha project development team began looking at
- ways to improve &marc; support and overcome scalability limitations
+ ways to improve &acro.marc; support and overcome scalability limitations
in the Koha 2.x series. After extensive evaluations of the best
of the Open Source textual database engines - including MySQL
full-text searching, PostgreSQL, Lucene and Plucene - the team
and relevance-ranked free-text queries, both of which the Koha
2.x series lack. &zebra; also supports incremental and safe
database updates, which allow on-the-fly record
- management. Finally, since &zebra; has at its heart the &z3950;
+ management. Finally, since &zebra; has at its heart the &acro.z3950;
protocol, it greatly improves Koha's support for that critical
library standard."
</para>
from virtually any computer with an Internet connection, has
template based layout allowing anyone to alter the visual
appearance of Emilda, and is
- &xml; based language for fast and easy portability to virtually any
+ &acro.xml; based language for fast and easy portability to virtually any
language.
Currently, Emilda is used at three schools in Espoo, Finland.
</para>
<para>
- As a surplus, 100% &marc; compatibility has been achieved using the
+ As a surplus, 100% &acro.marc; compatibility has been achieved using the
&zebra; Server from Index Data as backend server.
</para>
</section>
is a netbased library service offering all
traditional functions on a very high level plus many new
services. Reindex.net is a comprehensive and powerful WEB system
- based on standards such as &xml; and &z3950;.
- updates. Reindex supports &marc21;, dan&marc; eller Dublin Core with
+ based on standards such as &acro.xml; and &acro.z3950;.
+ updates. Reindex supports &acro.marc21;, dan&acro.marc; eller Dublin Core with
UTF8-encoding.
</para>
<para>
Reindex.net runs on GNU/Debian Linux with &zebra; and Simpleserver
from Index
Data for bibliographic data. The relational database system
- Sybase 9 &xml; is used for
+ Sybase 9 &acro.xml; is used for
administrative data.
- Internally &marcxml; is used for bibliographical records. Update
- utilizes &z3950; extended services.
+ Internally &acro.marcxml; is used for bibliographical records. Update
+ utilizes &acro.z3950; extended services.
</para>
</section>
The &zebra; information retrieval indexing machine is used inside
the Alvis framework to
manage huge collections of natural language processed and
- enhanced &xml; data, coming from a topic relevant web crawl.
- In this application, &zebra; swallows and manages 37GB of &xml; data
+ enhanced &acro.xml; data, coming from a topic relevant web crawl.
+ In this application, &zebra; swallows and manages 37GB of &acro.xml; data
in about 4 hours, resulting in search times of fractions of
seconds.
</para>
<para>
The member libraries send in data files representing their
periodicals, including both brief bibliographic data and summary
- holdings. Then 21 individual &z3950; targets are created, each
+ holdings. Then 21 individual &acro.z3950; targets are created, each
using &zebra;, and all mounted on the single hardware server.
- The live service provides a web gateway allowing &z3950; searching
+ The live service provides a web gateway allowing &acro.z3950; searching
of all of the targets or a selection of them. &zebra;'s small
footprint allows a relatively modest system to comfortably host
the 21 servers.
</section>
<section id="nli">
- <title>NLI-&z3950; - a Natural Language Interface for Libraries</title>
+ <title>NLI-&acro.z3950; - a Natural Language Interface for Libraries</title>
<para>
Fernuniversität Hagen in Germany have developed a natural
language interface for access to library databases.
In order to evaluate this interface for recall and precision, they
chose &zebra; as the basis for retrieval effectiveness. The &zebra;
server contains a copy of the GIRT database, consisting of more
- than 76000 records in &sgml; format (bibliographic records from
- social science), which are mapped to &marc; for presentation.
+ than 76000 records in &acro.sgml; format (bibliographic records from
+ social science), which are mapped to &acro.marc; for presentation.
</para>
<para>
(GIRT is the German Indexing and Retrieval Testdatabase. It is a
<?xml version="1.0" encoding="iso-8859-1" standalone="no" ?>
-<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook &xml; V4.2//EN"
+<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook &acro.xml; V4.2//EN"
"http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
-<!-- $Id: marc_indexing.xml,v 1.5 2007-02-02 11:10:08 marc Exp $ -->
+<!-- $Id: marc_indexing.xml,v 1.6 2007-05-24 13:44:09 adam Exp $ -->
<book id="marc_indexing">
<bookinfo>
- <title>Indexing of &marc; records by &zebra;</title>
+ <title>Indexing of &acro.marc; records by &zebra;</title>
<abstract>
- <simpara>&zebra; is suitable for distribution of &marc; records via &z3950;. We
- have a several possibilities to describe the indexing process of &marc; records.
+ <simpara>&zebra; is suitable for distribution of &acro.marc; records via &acro.z3950;. We
+ have a several possibilities to describe the indexing process of &acro.marc; records.
This document shows these possibilities.
</simpara>
</abstract>
</bookinfo>
<chapter id="simple">
- <title>Simple indexing of &marc; records</title>
+ <title>Simple indexing of &acro.marc; records</title>
<para>Simple indexing is not described yet.</para>
</chapter>
<chapter id="extended">
- <title>Extended indexing of &marc; records</title>
+ <title>Extended indexing of &acro.marc; records</title>
-<para>Extended indexing of &marc; records will help you if you need index a
+<para>Extended indexing of &acro.marc; records will help you if you need index a
combination of subfields, or index only a part of the whole field,
-or use during indexing process embedded fields of &marc; record.
+or use during indexing process embedded fields of &acro.marc; record.
</para>
-<para>Extended indexing of &marc; records additionally allows:
+<para>Extended indexing of &acro.marc; records additionally allows:
<itemizedlist>
<listitem>
-<para>to index data in LEADER of &marc; record</para>
+<para>to index data in LEADER of &acro.marc; record</para>
</listitem>
<listitem>
</listitem>
<listitem>
-<para>to index linked fields for UNI&marc; based formats</para>
+<para>to index linked fields for UNI&acro.marc; based formats</para>
</listitem>
</itemizedlist>
</para>
<note><para>In compare with simple indexing process the extended indexing
-may increase (about 2-3 times) the time of indexing process for &marc;
+may increase (about 2-3 times) the time of indexing process for &acro.marc;
records.</para></note>
<sect1 id="formula">
<title>The index-formula</title>
<para>At the beginning, we have to define the term <emphasis>index-formula</emphasis>
-for &marc; records. This term helps to understand the notation of extended indexing of MARC records
+for &acro.marc; records. This term helps to understand the notation of extended indexing of MARC records
by &zebra;. Our definition is based on the document <ulink url="http://www.rba.ru/rusmarc/soft/Z39-50.htm">"The
-table of conformity for &z3950; use attributes and R&usmarc; fields"</ulink>.
+table of conformity for &acro.z3950; use attributes and R&acro.usmarc; fields"</ulink>.
The document is available only in russian language.</para>
<para>The <emphasis>index-formula</emphasis> is the combination of subfields presented in such way:</para>
71-00$a, $g, $h ($c){.$b ($c)} , (1)
</screen>
-<para>We know that &zebra; supports a &bib1; attribute - right truncation.
+<para>We know that &zebra; supports a &acro.bib1; attribute - right truncation.
In this case, the <emphasis>index-formula</emphasis> (1) consists from
forms, defined in the same way as (1)</para>
71-00$a
</screen>
-<note><para>The original &marc; record may be without some elements, which included in <emphasis>index-formula</emphasis>.</para>
+<note><para>The original &acro.marc; record may be without some elements, which included in <emphasis>index-formula</emphasis>.</para>
</note>
<para>This notation includes such operands as:
<varlistentry>
<term>-</term>
- <listitem><para>The position may contain any value, defined by &marc; format.
+ <listitem><para>The position may contain any value, defined by &acro.marc; format.
For example, <emphasis>index-formula</emphasis></para>
<screen>
</varlistentry>
</variablelist>
-<note><para>All another operands are the same as accepted in &marc; world.</para>
+<note><para>All another operands are the same as accepted in &acro.marc; world.</para>
</note>
</para>
</sect1>
(<literal>.abs</literal> file). It means that names beginning with
<literal>"mc-"</literal> are interpreted by &zebra; as
<emphasis>index-formula</emphasis>. The database index is created and
-linked with <emphasis>access point</emphasis> (&bib1; use attribute)
+linked with <emphasis>access point</emphasis> (&acro.bib1; use attribute)
according to this formula.</para>
<para>For example, <emphasis>index-formula</emphasis></para>
<varlistentry>
<term>.</term>
-<listitem><para>The position may contain any value, defined by &marc; format. For example,
+<listitem><para>The position may contain any value, defined by &acro.marc; format. For example,
<emphasis>index-formula</emphasis></para>
<screen>
</para>
<note>
-<para>All another operands are the same as accepted in &marc; world.</para>
+<para>All another operands are the same as accepted in &acro.marc; world.</para>
</note>
<sect2>
elm mc-008[0-5] Date/time-added-to-db !
</screen>
-<para>or for R&usmarc; (this data included in 100th field)</para>
+<para>or for R&acro.usmarc; (this data included in 100th field)</para>
<screen>
elm mc-100___$a[0-7]_ Date/time-added-to-db !
<para>using indicators while indexing</para>
-<para>For R&usmarc; <emphasis>index-formula</emphasis>
+<para>For R&acro.usmarc; <emphasis>index-formula</emphasis>
<literal>70-#1$a, $g</literal> matches</para>
<screen>
<listitem>
-<para>indexing embedded (linked) fields for UNI&marc; based formats</para>
+<para>indexing embedded (linked) fields for UNI&acro.marc; based formats</para>
-<para>For R&usmarc; <emphasis>index-formula</emphasis>
+<para>For R&acro.usmarc; <emphasis>index-formula</emphasis>
<literal>4--#-$170-#1$a, $g ($c)</literal> matches</para>
<screen>
<chapter id="querymodel">
- <!-- $Id: querymodel.xml,v 1.31 2007-03-21 19:36:47 adam Exp $ -->
+ <!-- $Id: querymodel.xml,v 1.32 2007-05-24 13:44:09 adam Exp $ -->
<title>Query Model</title>
<section id="querymodel-overview">
<para>
&zebra; is born as a networking Information Retrieval engine adhering
to the international standards
- <ulink url="&url.z39.50;">&z3950;</ulink> and
- <ulink url="&url.sru;">&sru;</ulink>,
+ <ulink url="&url.z39.50;">&acro.z3950;</ulink> and
+ <ulink url="&url.sru;">&acro.sru;</ulink>,
and implement the
- type-1 Reverse Polish Notation (&rpn;) query
+ type-1 Reverse Polish Notation (&acro.rpn;) query
model defined there.
Unfortunately, this model has only defined a binary
encoded representation, which is used as transport packaging in
- the &z3950; protocol layer. This representation is not human
+ the &acro.z3950; protocol layer. This representation is not human
readable, nor defines any convenient way to specify queries.
</para>
<para>
- Since the type-1 (&rpn;)
+ Since the type-1 (&acro.rpn;)
query structure has no direct, useful string
representation, every client application needs to provide some
form of mapping from a local query notation or representation to it.
<section id="querymodel-query-languages-pqf">
- <title>Prefix Query Format (&pqf;)</title>
+ <title>Prefix Query Format (&acro.pqf;)</title>
<para>
Index Data has defined a textual representation in the
<ulink url="&url.yaz.pqf;">Prefix Query Format</ulink>, short
- <emphasis>&pqf;</emphasis>, which maps
+ <emphasis>&acro.pqf;</emphasis>, which maps
one-to-one to binary encoded
- <emphasis>type-1 &rpn;</emphasis> queries.
- &pqf; has been adopted by other
- parties developing &z3950; software, and is often referred to as
+ <emphasis>type-1 &acro.rpn;</emphasis> queries.
+ &acro.pqf; has been adopted by other
+ parties developing &acro.z3950; software, and is often referred to as
<emphasis>Prefix Query Notation</emphasis>, or in short
- &pqn;. See
+ &acro.pqn;. See
<xref linkend="querymodel-rpn"/> for further explanations and
descriptions of &zebra;'s capabilities.
</para>
</section>
<section id="querymodel-query-languages-cql">
- <title>Common Query Language (&cql;)</title>
+ <title>Common Query Language (&acro.cql;)</title>
<para>
- The query model of the type-1 &rpn;,
- expressed in &pqf;/&pqn; is natively supported.
- On the other hand, the default &sru;
+ The query model of the type-1 &acro.rpn;,
+ expressed in &acro.pqf;/&acro.pqn; is natively supported.
+ On the other hand, the default &acro.sru;
web services <emphasis>Common Query Language</emphasis>
- <ulink url="&url.cql;">&cql;</ulink> is not natively supported.
+ <ulink url="&url.cql;">&acro.cql;</ulink> is not natively supported.
</para>
<para>
- &zebra; can be configured to understand and map &cql; to &pqf;. See
+ &zebra; can be configured to understand and map &acro.cql; to &acro.pqf;. See
<xref linkend="querymodel-cql-to-pqf"/>.
</para>
</section>
<title>Operation types</title>
<para>
&zebra; supports all of the three different
- &z3950;/&sru; operations defined in the
+ &acro.z3950;/&acro.sru; operations defined in the
standards: explain, search,
and scan. A short description of the
functionality and purpose of each is quite in order here.
<section id="querymodel-operation-type-explain">
<title>Explain Operation</title>
<para>
- The <emphasis>syntax</emphasis> of &z3950;/&sru; queries is
+ The <emphasis>syntax</emphasis> of &acro.z3950;/&acro.sru; queries is
well known to any client, but the specific
<emphasis>semantics</emphasis> - taking into account a
particular servers functionalities and abilities - must be
of the general query model are supported.
</para>
<para>
- The &z3950; embeds the explain operation
+ The &acro.z3950; embeds the explain operation
by performing a
search in the magic
<literal>IR-Explain-1</literal> database;
see <xref linkend="querymodel-exp1"/>.
</para>
<para>
- In &sru;, explain is an entirely separate
- operation, which returns an ZeeRex &xml; record according to the
+ In &acro.sru;, explain is an entirely separate
+ operation, which returns an ZeeRex &acro.xml; record according to the
structure defined by the protocol.
</para>
<para>
simple free text searches to nested complex boolean queries,
targeting specific indexes, and possibly enhanced with many
query semantic specifications. Search interactions are the heart
- and soul of &z3950;/&sru; servers.
+ and soul of &acro.z3950;/&acro.sru; servers.
</para>
</section>
<section id="querymodel-rpn">
- <title>&rpn; queries and semantics</title>
+ <title>&acro.rpn; queries and semantics</title>
<para>
- The <ulink url="&url.yaz.pqf;">&pqf; grammar</ulink>
+ The <ulink url="&url.yaz.pqf;">&acro.pqf; grammar</ulink>
is documented in the &yaz; manual, and shall not be
- repeated here. This textual &pqf; representation
+ repeated here. This textual &acro.pqf; representation
is not transmistted to &zebra; during search, but it is in the
- client mapped to the equivalent &z3950; binary
+ client mapped to the equivalent &acro.z3950; binary
query parse tree.
</para>
<section id="querymodel-rpn-tree">
- <title>&rpn; tree structure</title>
+ <title>&acro.rpn; tree structure</title>
<para>
- The &rpn; parse tree - or the equivalent textual representation in &pqf; -
+ The &acro.rpn; parse tree - or the equivalent textual representation in &acro.pqf; -
may start with one specification of the
<emphasis>attribute set</emphasis> used. Following is a query
tree, which
- consists of <emphasis>atomic query parts (&apt;)</emphasis> or
+ consists of <emphasis>atomic query parts (&acro.apt;)</emphasis> or
<emphasis>named result sets</emphasis>, eventually
paired by <emphasis>boolean binary operators</emphasis>, and
finally <emphasis>recursively combined </emphasis> into
<thead>
<row>
<entry>Attribute set</entry>
- <entry>&pqf; notation (Short hand)</entry>
+ <entry>&acro.pqf; notation (Short hand)</entry>
<entry>Status</entry>
<entry>Notes</entry>
</row>
<entry>predefined</entry>
</row>
<row>
- <entry>&bib1;</entry>
+ <entry>&acro.bib1;</entry>
<entry><literal>bib-1</literal></entry>
- <entry>Standard &pqf; query language attribute set which defines the
- semantics of &z3950; searching. In addition, all of the
+ <entry>Standard &acro.pqf; query language attribute set which defines the
+ semantics of &acro.z3950; searching. In addition, all of the
non-use attributes (types 2-12) define the hard-wired
&zebra; internal query
processing.</entry>
<row>
<entry>GILS</entry>
<entry><literal>gils</literal></entry>
- <entry>Extension to the &bib1; attribute set.</entry>
+ <entry>Extension to the &acro.bib1; attribute set.</entry>
<entry>predefined</entry>
</row>
<!--
<row>
- <entry>&idxpath;</entry>
+ <entry>&acro.idxpath;</entry>
<entry><literal>idxpath</literal></entry>
- <entry>Hardwired &xpath; like attribute set, only available for
- indexing with the &grs1; record model</entry>
+ <entry>Hardwired &acro.xpath; like attribute set, only available for
+ indexing with the &acro.grs1; record model</entry>
<entry>deprecated</entry>
</row>
-->
<note>
<para>
The &zebra; internal query processing is modeled after
- the &bib1; attribute set, and the non-use
+ the &acro.bib1; attribute set, and the non-use
attributes type 2-6 are hard-wired in. It is therefore essential
to be familiar with <xref linkend="querymodel-bib1-nonuse"/>.
</para>
<emphasis>retrieval</emphasis>, taking proximity into account:
The hit set is a subset of the corresponding
AND query
- (see the <ulink url="&url.yaz.pqf;">&pqf; grammar</ulink> for
+ (see the <ulink url="&url.yaz.pqf;">&acro.pqf; grammar</ulink> for
details on the proximity operator):
<screen>
Z> find @prox 0 3 0 2 k 2 information retrieval
<section id="querymodel-atomic-queries">
- <title>Atomic queries (&apt;)</title>
+ <title>Atomic queries (&acro.apt;)</title>
<para>
Atomic queries are the query parts which work on one access point
only. These consist of <emphasis>an attribute list</emphasis>
followed by a <emphasis>single term</emphasis> or a
<emphasis>quoted term list</emphasis>, and are often called
- <emphasis>Attributes-Plus-Terms (&apt;)</emphasis> queries.
+ <emphasis>Attributes-Plus-Terms (&acro.apt;)</emphasis> queries.
</para>
<para>
- Atomic (&apt;) queries are always leaf nodes in the &pqf; query tree.
+ Atomic (&acro.apt;) queries are always leaf nodes in the &acro.pqf; query tree.
UN-supplied non-use attributes types 2-12 are either inherited from
higher nodes in the query tree, or are set to &zebra;'s default values.
See <xref linkend="querymodel-bib1"/> for details.
</para>
<table id="querymodel-atomic-queries-table" frame="top">
- <title>Atomic queries (&apt;)</title>
+ <title>Atomic queries (&acro.apt;)</title>
<tgroup cols="3">
<thead>
<row>
<para>
The <emphasis>scan</emphasis> operation is only supported with
- atomic &apt; queries, as it is bound to one access point at a
+ atomic &acro.apt; queries, as it is bound to one access point at a
time. Boolean query trees are not allowed during
<emphasis>scan</emphasis>.
</para>
<para>
Named result sets are supported in &zebra;, and result sets can be
used as operands without limitations. It follows that named
- result sets are leaf nodes in the &pqf; query tree, exactly as
- atomic &apt; queries are.
+ result sets are leaf nodes in the &acro.pqf; query tree, exactly as
+ atomic &acro.apt; queries are.
</para>
<para>
After the execution of a search, the result set is available at
<note>
<para>
- Named result sets are only supported by the &z3950; protocol.
- The &sru; web service is stateless, and therefore the notion of
+ Named result sets are only supported by the &acro.z3950; protocol.
+ The &acro.sru; web service is stateless, and therefore the notion of
named result sets does not exist when accessing a &zebra; server by
- the &sru; protocol.
+ the &acro.sru; protocol.
</para>
</note>
</section>
</para>
<para>
Finding all documents which have the term list "information
- retrieval" in an &zebra; index, using it's internal full string
+ retrieval" in an &zebra; index, using its internal full string
name. Scanning the same index.
<screen>
Z> find @attr 1=sometext "information retrieval"
</para>
<para>
Searching or scanning
- the bib-1 use attribute 54 using it's string name:
+ the bib-1 use attribute 54 using its string name:
<screen>
Z> find @attr 1=Code-language eng
Z> scan @attr 1=Code-language ""
<para>
It is possible to search
in any silly string index - if it's defined in your
- indexation rules and can be parsed by the &pqf; parser.
+ indexation rules and can be parsed by the &acro.pqf; parser.
This is definitely not the recommended use of
this facility, as it might confuse your users with some very
unexpected results.
<para>
See also <xref linkend="querymodel-pqf-apt-mapping"/> for details, and
<xref linkend="zebrasrv-sru"/>
- for the &sru; &pqf; query extension using string names as a fast
+ for the &acro.sru; &acro.pqf; query extension using string names as a fast
debugging facility.
</para>
</section>
<section id="querymodel-use-xpath">
<title>&zebra;'s special access point of type 'XPath'
- for &grs1; filters</title>
+ for &acro.grs1; filters</title>
<para>
As we have seen above, it is possible (albeit seldom a great
idea) to emulate
be defined at indexation time, no new undefined
XPath queries can entered at search time, and second, it might
confuse users very much that an XPath-alike index name in fact
- gets populated from a possible entirely different &xml; element
+ gets populated from a possible entirely different &acro.xml; element
than it pretends to access.
</para>
<para>
- When using the &grs1; Record Model
+ When using the &acro.grs1; Record Model
(see <xref linkend="grs"/>), we have the
possibility to embed <emphasis>life</emphasis>
XPath expressions
- in the &pqf; queries, which are here called
+ in the &acro.pqf; queries, which are here called
<emphasis>use (type 1)</emphasis> <emphasis>xpath</emphasis>
attributes. You must enable the
<literal>xpath enable</literal> directive in your
<para>
Only a <emphasis>very</emphasis> restricted subset of the
<ulink url="http://www.w3.org/TR/xpath">XPath 1.0</ulink>
- standard is supported as the &grs1; record model is simpler than
- a full &xml; &dom; structure. See the following examples for
+ standard is supported as the &acro.grs1; record model is simpler than
+ a full &acro.xml; &acro.dom; structure. See the following examples for
possibilities.
</para>
</note>
<para>
Finding all documents which have the term "content"
- inside a text node found in a specific &xml; &dom;
+ inside a text node found in a specific &acro.xml; &acro.dom;
<emphasis>subtree</emphasis>, whose starting element is
addressed by XPath.
<screen>
<para>
Filter the addressing XPath by a predicate working on exact
string values in
- attributes (in the &xml; sense) can be done: return all those docs which
+ attributes (in the &acro.xml; sense) can be done: return all those docs which
have the term "english" contained in one of all text sub nodes of
the subtree defined by the XPath
<literal>/record/title[@lang='en']</literal>. And similar
</screen>
</para>
<para>
- Escaping &pqf; keywords and other non-parseable XPath constructs
- with <literal>'{ }'</literal> to prevent client-side &pqf; parsing
+ Escaping &acro.pqf; keywords and other non-parseable XPath constructs
+ with <literal>'{ }'</literal> to prevent client-side &acro.pqf; parsing
syntax errors:
<screen>
Z> find @attr {1=/root/first[@attr='danish']} content
<section id="querymodel-exp1">
<title>Explain Attribute Set</title>
<para>
- The &z3950; standard defines the
+ The &acro.z3950; standard defines the
<ulink url="&url.z39.50.explain;">Explain</ulink> attribute set
Exp-1, which is used to discover information
about a server's search semantics and functional capabilities
</para>
<para>
In addition, the non-Use
- &bib1; attributes, that is, the types
+ &acro.bib1; attributes, that is, the types
<emphasis>Relation</emphasis>, <emphasis>Position</emphasis>,
<emphasis>Structure</emphasis>, <emphasis>Truncation</emphasis>,
and <emphasis>Completeness</emphasis> are imported from
- the &bib1; attribute set, and may be used
+ the &acro.bib1; attribute set, and may be used
within any explain query.
</para>
</para>
<para>
See <filename>tab/explain.att</filename> and the
- <ulink url="&url.z39.50;">&z3950;</ulink> standard
+ <ulink url="&url.z39.50;">&acro.z3950;</ulink> standard
for more information.
</para>
</section>
<title>Explain searches with yaz-client</title>
<para>
Classic Explain only defines retrieval of Explain information
- via ASN.1. Practically no &z3950; clients supports this. Fortunately
+ via ASN.1. Practically no &acro.z3950; clients supports this. Fortunately
they don't have to - &zebra; allows retrieval of this information
in other formats:
- <literal>&sutrs;</literal>, <literal>&xml;</literal>,
- <literal>&grs1;</literal> and <literal>ASN.1</literal> Explain.
+ <literal>&acro.sutrs;</literal>, <literal>&acro.xml;</literal>,
+ <literal>&acro.grs1;</literal> and <literal>ASN.1</literal> Explain.
</para>
<para>
<literal>Default</literal>.
This query is very useful to study the internal &zebra; indexes.
If records have been indexed using the <literal>alvis</literal>
- &xslt; filter, the string representation names of the known indexes can be
+ &acro.xslt; filter, the string representation names of the known indexes can be
found.
<screen>
Z> base IR-Explain-1
</section>
<section id="querymodel-bib1">
- <title>&bib1; Attribute Set</title>
+ <title>&acro.bib1; Attribute Set</title>
<para>
Most of the information contained in this section is an excerpt of
- the ATTRIBUTE SET &bib1; (&z3950;-1995) SEMANTICS
- found at <ulink url="&url.z39.50.attset.bib1.1995;">. The &bib1;
+ the ATTRIBUTE SET &acro.bib1; (&acro.z3950;-1995) SEMANTICS
+ found at <ulink url="&url.z39.50.attset.bib1.1995;">. The &acro.bib1;
Attribute Set Semantics</ulink> from 1995, also in an updated
- <ulink url="&url.z39.50.attset.bib1;">&bib1;
+ <ulink url="&url.z39.50.attset.bib1;">&acro.bib1;
Attribute Set</ulink>
version from 2003. Index Data is not the copyright holder of this
information, except for the configuration details, the listing of
<filename>tab/gils.att</filename>.
</para>
<para>
- For example, some few &bib1; use
+ For example, some few &acro.bib1; use
attributes from the <filename>tab/bib1.att</filename> are:
<screen>
att 1 Personal-name
<emphasis>AlwaysMatches (103)</emphasis> is a
great way to discover how many documents have been indexed in a
given field. The search term is ignored, but needed for correct
- &pqf; syntax. An empty search term may be supplied.
+ &acro.pqf; syntax. An empty search term may be supplied.
<screen>
Z> find @attr 1=Title @attr 2=103 ""
Z> find @attr 1=Title @attr 2=103 @attr 4=1 ""
is supported, and maps to the boolean <literal>AND</literal>
combination of words supplied. The word list is useful when
google-like bag-of-word queries need to be translated from a GUI
- query language to &pqf;. For example, the following queries
+ query language to &acro.pqf;. For example, the following queries
are equivalent:
<screen>
Z> find @attr 1=Title @attr 4=6 "mozart amadeus"
</para>
<note>
<para>
- The exact mapping between &pqf; queries and &zebra; internal indexes
+ The exact mapping between &acro.pqf; queries and &zebra; internal indexes
and index types is explained in
<xref linkend="querymodel-pqf-apt-mapping"/>.
</para>
</para>
<para>
The <literal>Complete subfield (2)</literal> is a reminiscens
- from the happy <literal>&marc;</literal>
+ from the happy <literal>&acro.marc;</literal>
binary format days. &zebra; does not support it, but maps silently
to <literal>Complete field (3)</literal>.
</para>
<note>
<para>
- The exact mapping between &pqf; queries and &zebra; internal indexes
+ The exact mapping between &acro.pqf; queries and &zebra; internal indexes
and index types is explained in
<xref linkend="querymodel-pqf-apt-mapping"/>.
</para>
<section id="querymodel-zebra">
- <title>Extended &zebra; &rpn; Features</title>
+ <title>Extended &zebra; &acro.rpn; Features</title>
<para>
The &zebra; internal query engine has been extended to specific needs
not covered by the <literal>bib-1</literal> attribute set query
<section id="querymodel-zebra-attr-search">
<title>&zebra; specific Search Extensions to all Attribute Sets</title>
<para>
- &zebra; extends the &bib1; attribute types, and these extensions are
+ &zebra; extends the &acro.bib1; attribute types, and these extensions are
recognized regardless of attribute
set used in a <literal>search</literal> operation query.
</para>
The possible values after attribute <literal>type 7</literal> are
<literal>1</literal> ascending and
<literal>2</literal> descending.
- The attributes+term (&apt;) node is separate from the
+ The attributes+term (&acro.apt;) node is separate from the
rest and must be <literal>@or</literal>'ed.
- The term associated with &apt; is the sorting level in integers,
+ The term associated with &acro.apt; is the sorting level in integers,
where <literal>0</literal> means primary sort,
<literal>1</literal> means secondary sort, and so forth.
See also <xref linkend="administration-ranking"/>.
a scan-like facility. Requires a client that can do named result
sets since the search generates two result sets. The value for
attribute 8 is the name of a result set (string). The terms in
- the named term set are returned as &sutrs; records.
+ the named term set are returned as &acro.sutrs; records.
</para>
<para>
For example, searching for u in title, right truncated, and
<title>&zebra; Extension Rank Weight Attribute (type 9)</title>
<para>
Rank weight is a way to pass a value to a ranking algorithm - so
- that one &apt; has one value - while another as a different one.
+ that one &acro.apt; has one value - while another as a different one.
See also <xref linkend="administration-ranking"/>.
</para>
<para>
&zebra; supports the searchResult-1 facility.
If the Term Reference Attribute (type 10) is
given, that specifies a subqueryId value returned as part of the
- search result. It is a way for a client to name an &apt; part of a
+ search result. It is a way for a client to name an &acro.apt; part of a
query.
</para>
<!--
<title>Local Approximative Limit Attribute (type 11)</title>
<para>
&zebra; computes - unless otherwise configured -
- the exact hit count for every &apt;
+ the exact hit count for every &acro.apt;
(leaf) in the query tree. These hit counts are returned as part of
- the searchResult-1 facility in the binary encoded &z3950; search
+ the searchResult-1 facility in the binary encoded &acro.z3950; search
response packages.
</para>
<para>
- By setting an estimation limit size of the resultset of the &apt;
+ By setting an estimation limit size of the resultset of the &acro.apt;
leaves, &zebra; stoppes processing the result set when the limit
length is reached.
Hit counts under this limit are still precise, but hit counts over it
</para>
<para>
The attribute (12) can occur anywhere in the query tree.
- Unlike regular attributes it does not relate to the leaf (&apt;)
+ Unlike regular attributes it does not relate to the leaf (&acro.apt;)
- but to the whole query.
</para>
<warning>
</section>
<section id="querymodel-idxpath">
- <title>&zebra; special &idxpath; Attribute Set for &grs1; indexing</title>
+ <title>&zebra; special &acro.idxpath; Attribute Set for &acro.grs1; indexing</title>
<para>
The attribute-set <literal>idxpath</literal> consists of a single
Use (type 1) attribute. All non-use attributes behave as normal.
</para>
<para>
This feature is enabled when defining the
- <literal>xpath enable</literal> option in the &grs1; filter
+ <literal>xpath enable</literal> option in the &acro.grs1; filter
<filename>*.abs</filename> configuration files. If one wants to use
the special <literal>idxpath</literal> numeric attribute set, the
main &zebra; configuration file <filename>zebra.cfg</filename>
</warning>
<section id="querymodel-idxpath-use">
- <title>&idxpath; Use Attributes (type = 1)</title>
+ <title>&acro.idxpath; Use Attributes (type = 1)</title>
<para>
- This attribute set allows one to search &grs1; filter indexed
- records by &xpath; like structured index names.
+ This attribute set allows one to search &acro.grs1; filter indexed
+ records by &acro.xpath; like structured index names.
</para>
<warning>
</warning>
<table id="querymodel-idxpath-use-table" frame="top">
- <title>&zebra; specific &idxpath; Use Attributes (type 1)</title>
+ <title>&zebra; specific &acro.idxpath; Use Attributes (type 1)</title>
<tgroup cols="4">
<thead>
<row>
- <entry>&idxpath;</entry>
+ <entry>&acro.idxpath;</entry>
<entry>Value</entry>
<entry>String Index</entry>
<entry>Notes</entry>
</thead>
<tbody>
<row>
- <entry>&xpath; Begin</entry>
+ <entry>&acro.xpath; Begin</entry>
<entry>1</entry>
<entry>_XPATH_BEGIN</entry>
<entry>deprecated</entry>
</row>
<row>
- <entry>&xpath; End</entry>
+ <entry>&acro.xpath; End</entry>
<entry>2</entry>
<entry>_XPATH_END</entry>
<entry>deprecated</entry>
</row>
<row>
- <entry>&xpath; CData</entry>
+ <entry>&acro.xpath; CData</entry>
<entry>1016</entry>
<entry>_XPATH_CDATA</entry>
<entry>deprecated</entry>
</row>
<row>
- <entry>&xpath; Attribute Name</entry>
+ <entry>&acro.xpath; Attribute Name</entry>
<entry>3</entry>
<entry>_XPATH_ATTR_NAME</entry>
<entry>deprecated</entry>
</row>
<row>
- <entry>&xpath; Attribute CData</entry>
+ <entry>&acro.xpath; Attribute CData</entry>
<entry>1015</entry>
<entry>_XPATH_ATTR_CDATA</entry>
<entry>deprecated</entry>
</screen>
</para>
<para>
- Search for all documents where specific nested &xpath;
+ Search for all documents where specific nested &acro.xpath;
<literal>/c1/c2/../cn</literal> exists. Notice the very
counter-intuitive <emphasis>reverse</emphasis> notation!
<screen>
</screen>
</para>
<para>
- Search for all documents with have an &xml; element node
- including an &xml; attribute named <emphasis>creator</emphasis>
+ Search for all documents with have an &acro.xml; element node
+ including an &acro.xml; attribute named <emphasis>creator</emphasis>
<screen>
Z> find @attrset idxpath @attr 1=3 @attr 4=3 creator
Z> find @attr 1=_XPATH_ATTR_NAME @attr 4=3 creator
<section id="querymodel-pqf-apt-mapping">
- <title>Mapping from &pqf; atomic &apt; queries to &zebra; internal
+ <title>Mapping from &acro.pqf; atomic &acro.apt; queries to &zebra; internal
register indexes</title>
<para>
- The rules for &pqf; &apt; mapping are rather tricky to grasp in the
+ The rules for &acro.pqf; &acro.apt; mapping are rather tricky to grasp in the
first place. We deal first with the rules for deciding which
internal register or string index to use, according to the use
attribute or access point specified in the query. Thereafter we
</para>
<section id="querymodel-pqf-apt-mapping-accesspoint">
- <title>Mapping of &pqf; &apt; access points</title>
+ <title>Mapping of &acro.pqf; &acro.apt; access points</title>
<para>
&zebra; understands four fundamental different types of access
points, of which only the
<emphasis>numeric use attribute</emphasis> type access points
- are defined by the <ulink url="&url.z39.50;">&z3950;</ulink>
+ are defined by the <ulink url="&url.z39.50;">&acro.z3950;</ulink>
standard.
All other access point types are &zebra; specific, and non-portable.
</para>
<entry>hardwired internal string index name</entry>
</row>
<row>
- <entry>&xpath; special index</entry>
+ <entry>&acro.xpath; special index</entry>
<entry>XPath</entry>
<entry>/.*</entry>
- <entry>special xpath search for &grs1; indexed records</entry>
+ <entry>special xpath search for &acro.grs1; indexed records</entry>
</row>
</tbody>
</tgroup>
<emphasis>Numeric use attributes</emphasis> are mapped
to the &zebra; internal
string index according to the attribute set definition in use.
- The default attribute set is <literal>&bib1;</literal>, and may be
- omitted in the &pqf; query.
+ The default attribute set is <literal>&acro.bib1;</literal>, and may be
+ omitted in the &acro.pqf; query.
</para>
<para>
According to normalization and numeric
use attribute mapping, it follows that the following
- &pqf; queries are considered equivalent (assuming the default
+ &acro.pqf; queries are considered equivalent (assuming the default
configuration has not been altered):
<screen>
Z> find @attr 1=Body-of-text serenade
Z> find @attr 1=BodyOfText serenade
Z> find @attr 1=bO-d-Y-of-tE-x-t serenade
Z> find @attr 1=1010 serenade
- Z> find @attrset &bib1; @attr 1=1010 serenade
+ Z> find @attrset &acro.bib1; @attr 1=1010 serenade
Z> find @attrset bib1 @attr 1=1010 serenade
Z> find @attrset Bib1 @attr 1=1010 serenade
Z> find @attrset b-I-b-1 @attr 1=1010 serenade
fields as specified in the <literal>.abs</literal> file which
describes the profile of the records which have been loaded.
If no use attribute is provided, a default of
- &bib1; Use Any (1016) is assumed.
+ &acro.bib1; Use Any (1016) is assumed.
The predefined use attribute sets
can be reconfigured by tweaking the configuration files
<filename>tab/*.att</filename>, and
ignored. The above mentioned name normalization applies.
String index names are defined in the
used indexing filter configuration files, for example in the
- <literal>&grs1;</literal>
+ <literal>&acro.grs1;</literal>
<filename>*.abs</filename> configuration files, or in the
- <literal>alvis</literal> filter &xslt; indexing stylesheets.
+ <literal>alvis</literal> filter &acro.xslt; indexing stylesheets.
</para>
<para>
</para>
<para>
- Finally, <literal>&xpath;</literal> access points are only
- available using the <literal>&grs1;</literal> filter for indexing.
+ Finally, <literal>&acro.xpath;</literal> access points are only
+ available using the <literal>&acro.grs1;</literal> filter for indexing.
These access point names must start with the character
<literal>'/'</literal>, they are <emphasis>not
normalized</emphasis>, but passed unaltered to the &zebra; internal
- &xpath; engine. See <xref linkend="querymodel-use-xpath"/>.
+ &acro.xpath; engine. See <xref linkend="querymodel-use-xpath"/>.
</para>
<section id="querymodel-pqf-apt-mapping-structuretype">
- <title>Mapping of &pqf; &apt; structure and completeness to
+ <title>Mapping of &acro.pqf; &acro.apt; structure and completeness to
register type</title>
<para>
- Internally &zebra; has in it's default configuration several
+ Internally &zebra; has in its default configuration several
different types of registers or indexes, whose tokenization and
character normalization rules differ. This reflects the fact that
searching fundamental different tokens like dates, numbers,
against the contents of the phrase (long word) register, if one
exists for the given <emphasis>Use</emphasis> attribute.
A phrase register is created for those fields in the
- &grs1; <filename>*.abs</filename> file that contains a
+ &acro.grs1; <filename>*.abs</filename> file that contains a
<literal>p</literal>-specifier.
<screen>
Z> scan @attr 1=Title @attr 4=1 @attr 6=3 beethoven
contains multiple words, the term will only match if all of the words
are found immediately adjacent, and in the given order.
The word search is performed on those fields that are indexed as
- type <literal>w</literal> in the &grs1; <filename>*.abs</filename> file.
+ type <literal>w</literal> in the &acro.grs1; <filename>*.abs</filename> file.
<screen>
Z> scan @attr 1=Title @attr 4=1 @attr 6=1 beethoven
...
natural-language, relevance-ranked query.
This search type uses the word register, i.e. those fields
that are indexed as type <literal>w</literal> in the
- &grs1; <filename>*.abs</filename> file.
+ &acro.grs1; <filename>*.abs</filename> file.
</para>
<para>
If the <emphasis>Structure</emphasis> attribute is
<emphasis>Numeric String</emphasis> the term is treated as an integer.
The search is performed on those fields that are indexed
- as type <literal>n</literal> in the &grs1;
+ as type <literal>n</literal> in the &acro.grs1;
<filename>*.abs</filename> file.
</para>
<section id="querymodel-cql-to-pqf">
- <title>Server Side &cql; to &pqf; Query Translation</title>
+ <title>Server Side &acro.cql; to &acro.pqf; Query Translation</title>
<para>
Using the
<literal><cql2rpn>l2rpn.txt</cql2rpn></literal>
&yaz; Frontend Virtual
Hosts option, one can configure
- the &yaz; Frontend &cql;-to-&pqf;
+ the &yaz; Frontend &acro.cql;-to-&acro.pqf;
converter, specifying the interpretation of various
- <ulink url="&url.cql;">&cql;</ulink>
+ <ulink url="&url.cql;">&acro.cql;</ulink>
indexes, relations, etc. in terms of Type-1 query attributes.
<!-- The yaz-client config file -->
</para>
<para>
- For example, using server-side &cql;-to-&pqf; conversion, one might
+ For example, using server-side &acro.cql;-to-&acro.pqf; conversion, one might
query a zebra server like this:
<screen>
<![CDATA[
]]>
</screen>
and - if properly configured - even static relevance ranking can
- be performed using &cql; query syntax:
+ be performed using &acro.cql; query syntax:
<screen>
<![CDATA[
Z> find text = /relevant (plant and soil)
<para>
By the way, the same configuration can be used to
- search using client-side &cql;-to-&pqf; conversion:
+ search using client-side &acro.cql;-to-&acro.pqf; conversion:
(the only difference is <literal>querytype cql2rpn</literal>
instead of
<literal>querytype cql</literal>, and the call specifying a local
<para>
Exhaustive information can be found in the
- Section <ulink url="&url.yaz.cql2pqf;">&cql; to &rpn; conversion"</ulink>
+ Section <ulink url="&url.yaz.cql2pqf;">&acro.cql; to &acro.rpn; conversion"</ulink>
in the &yaz; manual.
</para>
<!--
<chapter id="quick-start">
- <!-- $Id: quickstart.xml,v 1.13 2007-02-02 11:10:08 marc Exp $ -->
+ <!-- $Id: quickstart.xml,v 1.14 2007-05-24 13:44:09 adam Exp $ -->
<title>Quick Start </title>
<para>
named <literal>Default</literal>.
The database contains records structured according to
the GILS profile, and the server will
- return records in &usmarc;, &grs1;, or &sutrs; format depending
+ return records in &acro.usmarc;, &acro.grs1;, or &acro.sutrs; format depending
on what the client asks for.
</para>
<para>
- To test the server, you can use any &z3950; client.
+ To test the server, you can use any &acro.z3950; client.
For instance, you can use the demo command-line client that comes
with &yaz;:
</para>
</para>
<para>
- The default retrieval syntax for the client is &usmarc;, and the
+ The default retrieval syntax for the client is &acro.usmarc;, and the
default element set is <literal>F</literal> (``full record''). To
try other formats and element sets for the same record, try:
</para>
<note>
<para>You may notice that more fields are returned when your
- client requests &sutrs;, &grs1; or &xml; records.
+ client requests &acro.sutrs;, &acro.grs1; or &acro.xml; records.
This is normal - not all of the GILS data elements have mappings in
- the &usmarc; record format.
+ the &acro.usmarc; record format.
</para>
</note>
<para>
<chapter id="record-model-alvisxslt">
- <!-- $Id: recordmodel-alvisxslt.xml,v 1.18 2007-03-07 13:05:20 marc Exp $ -->
- <title>ALVIS &xml; Record Model and Filter Module</title>
+ <!-- $Id: recordmodel-alvisxslt.xml,v 1.19 2007-05-24 13:44:09 adam Exp $ -->
+ <title>ALVIS &acro.xml; Record Model and Filter Module</title>
<warning>
<para>
The functionality of this record model has been improved and
- replaced by the DOM &xml; record model, see
- <xref linkend="record-model-domxml"/>. The Alvis &xml; record
+ replaced by the DOM &acro.xml; record model, see
+ <xref linkend="record-model-domxml"/>. The Alvis &acro.xml; record
model is considered obsolete, and will eventually be removed
from future releases of the &zebra; software.
</para>
<para>
The record model described in this chapter applies to the fundamental,
- structured &xml;
+ structured &acro.xml;
record type <literal>alvis</literal>, introduced in
<xref linkend="componentmodulesalvis"/>.
</para>
<section id="record-model-alvisxslt-filter">
<title>ALVIS Record Filter</title>
<para>
- The experimental, loadable Alvis &xml;/&xslt; filter module
+ The experimental, loadable Alvis &acro.xml;/&acro.xslt; filter module
<literal>mod-alvis.so</literal> is packaged in the GNU/Debian package
<literal>libidzebra1.4-mod-alvis</literal>.
It is invoked by the <filename>zebra.cfg</filename> configuration statement
</screen>
In this example on all data files with suffix
<filename>*.xml</filename>, where the
- Alvis &xslt; filter configuration file is found in the
+ Alvis &acro.xslt; filter configuration file is found in the
path <filename>db/filter_alvis_conf.xml</filename>.
</para>
- <para>The Alvis &xslt; filter configuration file must be
- valid &xml;. It might look like this (This example is
- used for indexing and display of &oai; harvested records):
+ <para>The Alvis &acro.xslt; filter configuration file must be
+ valid &acro.xml;. It might look like this (This example is
+ used for indexing and display of &acro.oai; harvested records):
<screen>
<?xml version="1.0" encoding="UTF-8"?>
<schemaInfo>
<schema name="index" identifier="http://indexdata.dk/zebra/xslt/1"
stylesheet="xsl/oai2index.xsl" />
<schema name="dc" stylesheet="xsl/oai2dc.xsl" />
- <!-- use split level 2 when indexing whole &oai; Record lists -->
+ <!-- use split level 2 when indexing whole &acro.oai; Record lists -->
<split level="2"/>
</schemaInfo>
</screen>
names defined in the <literal>name</literal> attributes must be
unique, these are the literal <literal>schema</literal> or
<literal>element set</literal> names used in
- <ulink url="http://www.loc.gov/standards/sru/srw/">&srw;</ulink>,
- <ulink url="&url.sru;">&sru;</ulink> and
- &z3950; protocol queries.
+ <ulink url="http://www.loc.gov/standards/sru/srw/">&acro.srw;</ulink>,
+ <ulink url="&url.sru;">&acro.sru;</ulink> and
+ &acro.z3950; protocol queries.
The paths in the <literal>stylesheet</literal> attributes
are relative to zebras working directory, or absolute to file
system root.
</para>
<para>
The <literal><split level="2"/></literal> decides where the
- &xml; Reader shall split the
+ &acro.xml; Reader shall split the
collections of records into individual records, which then are
- loaded into &dom;, and have the indexing &xslt; stylesheet applied.
+ loaded into &acro.dom;, and have the indexing &acro.xslt; stylesheet applied.
</para>
<para>
- There must be exactly one indexing &xslt; stylesheet, which is
+ There must be exactly one indexing &acro.xslt; stylesheet, which is
defined by the magic attribute
<literal>identifier="http://indexdata.dk/zebra/xslt/1"</literal>.
</para>
<section id="record-model-alvisxslt-internal">
<title>ALVIS Internal Record Representation</title>
- <para>When indexing, an &xml; Reader is invoked to split the input
- files into suitable record &xml; pieces. Each record piece is then
- transformed to an &xml; &dom; structure, which is essentially the
- record model. Only &xslt; transformations can be applied during
+ <para>When indexing, an &acro.xml; Reader is invoked to split the input
+ files into suitable record &acro.xml; pieces. Each record piece is then
+ transformed to an &acro.xml; &acro.dom; structure, which is essentially the
+ record model. Only &acro.xslt; transformations can be applied during
index, search and retrieval. Consequently, output formats are
- restricted to whatever &xslt; can deliver from the record &xml;
- structure, be it other &xml; formats, HTML, or plain text. In case
- you have <literal>libxslt1</literal> running with E&xslt; support,
+ restricted to whatever &acro.xslt; can deliver from the record &acro.xml;
+ structure, be it other &acro.xml; formats, HTML, or plain text. In case
+ you have <literal>libxslt1</literal> running with E&acro.xslt; support,
you can use this functionality inside the Alvis
- filter configuration &xslt; stylesheets.
+ filter configuration &acro.xslt; stylesheets.
</para>
</section>
<section id="record-model-alvisxslt-canonical">
<title>ALVIS Canonical Indexing Format</title>
- <para>The output of the indexing &xslt; stylesheets must contain
+ <para>The output of the indexing &acro.xslt; stylesheets must contain
certain elements in the magic
<literal>xmlns:z="http://indexdata.dk/zebra/xslt/1"</literal>
- namespace. The output of the &xslt; indexing transformation is then
- parsed using &dom; methods, and the contained instructions are
+ namespace. The output of the &acro.xslt; indexing transformation is then
+ parsed using &acro.dom; methods, and the contained instructions are
performed on the <emphasis>magic elements and their
subtrees</emphasis>.
</para>
</z:record>
</screen>
</para>
- <para>This means the following: From the original &xml; file
- <literal>one-record.xml</literal> (or from the &xml; record &dom; of the
+ <para>This means the following: From the original &acro.xml; file
+ <literal>one-record.xml</literal> (or from the &acro.xml; record &acro.dom; of the
same form coming from a split input file), the indexing
- stylesheet produces an indexing &xml; record, which is defined by
+ stylesheet produces an indexing &acro.xml; record, which is defined by
the <literal>record</literal> element in the magic namespace
<literal>xmlns:z="http://indexdata.dk/zebra/xslt/1"</literal>.
&zebra; uses the content of
the same character normalization map <literal>w</literal>.
</para>
<para>
- Finally, this example configuration can be queried using &pqf;
- queries, either transported by &z3950;, (here using a yaz-client)
+ Finally, this example configuration can be queried using &acro.pqf;
+ queries, either transported by &acro.z3950;, (here using a yaz-client)
<screen>
<![CDATA[
Z> open localhost:9999
or the proprietary
extensions <literal>x-pquery</literal> and
<literal>x-pScanClause</literal> to
- &sru;, and &srw;
+ &acro.sru;, and &acro.srw;
<screen>
<![CDATA[
http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=%40attr+1%3Ddc_creator+%40attr+4%3D6+%22the
http://localhost:9999/?version=1.1&operation=scan&x-pScanClause=@attr+1=dc_date+@attr+4=2+a
]]>
</screen>
- See <xref linkend="zebrasrv-sru"/> for more information on &sru;/&srw;
+ See <xref linkend="zebrasrv-sru"/> for more information on &acro.sru;/&acro.srw;
configuration, and <xref linkend="gfs-config"/> or the &yaz;
- <ulink url="&url.yaz.cql;">&cql; section</ulink>
+ <ulink url="&url.yaz.cql;">&acro.cql; section</ulink>
for the details or the &yaz; frontend server.
</para>
<para>
Notice that there are no <filename>*.abs</filename>,
- <filename>*.est</filename>, <filename>*.map</filename>, or other &grs1;
+ <filename>*.est</filename>, <filename>*.map</filename>, or other &acro.grs1;
filter configuration files involves in this process, and that the
literal index names are used during search and retrieval.
</para>
<para>
As mentioned above, there can be only one indexing
stylesheet, and configuration of the indexing process is a synonym
- of writing an &xslt; stylesheet which produces &xml; output containing the
+ of writing an &acro.xslt; stylesheet which produces &acro.xml; output containing the
magic elements discussed in
<xref linkend="record-model-alvisxslt-internal"/>.
Obviously, there are million of different ways to accomplish this
<para>
Stylesheets can be written in the <emphasis>pull</emphasis> or
the <emphasis>push</emphasis> style: <emphasis>pull</emphasis>
- means that the output &xml; structure is taken as starting point of
- the internal structure of the &xslt; stylesheet, and portions of
- the input &xml; are <emphasis>pulled</emphasis> out and inserted
- into the right spots of the output &xml; structure. On the other
- side, <emphasis>push</emphasis> &xslt; stylesheets are recursively
+ means that the output &acro.xml; structure is taken as starting point of
+ the internal structure of the &acro.xslt; stylesheet, and portions of
+ the input &acro.xml; are <emphasis>pulled</emphasis> out and inserted
+ into the right spots of the output &acro.xml; structure. On the other
+ side, <emphasis>push</emphasis> &acro.xslt; stylesheets are recursively
calling their template definitions, a process which is commanded
- by the input &xml; structure, and are triggered to produce some output &xml;
+ by the input &acro.xml; structure, and are triggered to produce some output &acro.xml;
whenever some special conditions in the input stylesheets are
met. The <emphasis>pull</emphasis> type is well-suited for input
- &xml; with strong and well-defined structure and semantics, like the
- following &oai; indexing example, whereas the
+ &acro.xml; with strong and well-defined structure and semantics, like the
+ following &acro.oai; indexing example, whereas the
<emphasis>push</emphasis> type might be the only possible way to
- sort out deeply recursive input &xml; formats.
+ sort out deeply recursive input &acro.xml; formats.
</para>
<para>
A <emphasis>pull</emphasis> stylesheet example used to index
- &oai; harvested records could use some of the following template
+ &acro.oai; harvested records could use some of the following template
definitions:
<screen>
<![CDATA[
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:z="http://indexdata.dk/zebra/xslt/1"
- xmlns:oai="http://www.openarchives.org/&oai;/2.0/"
- xmlns:oai_dc="http://www.openarchives.org/&oai;/2.0/oai_dc/"
+ xmlns:oai="http://www.openarchives.org/&acro.oai;/2.0/"
+ xmlns:oai_dc="http://www.openarchives.org/&acro.oai;/2.0/oai_dc/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
version="1.0">
<!-- match on oai xml record root -->
<xsl:template match="/">
<z:record z:id="{normalize-space(oai:record/oai:header/oai:identifier)}">
- <!-- you might want to use z:rank="{some &xslt; function here}" -->
+ <!-- you might want to use z:rank="{some &acro.xslt; function here}" -->
<xsl:apply-templates/>
</z:record>
</xsl:template>
- <!-- &oai; indexing templates -->
+ <!-- &acro.oai; indexing templates -->
<xsl:template match="oai:record/oai:header/oai:identifier">
<z:index name="oai_identifier" type="0">
<xsl:value-of select="."/>
<para>
Notice also,
that the names and types of the indexes can be defined in the
- indexing &xslt; stylesheet <emphasis>dynamically according to
- content in the original &xml; records</emphasis>, which has
+ indexing &acro.xslt; stylesheet <emphasis>dynamically according to
+ content in the original &acro.xml; records</emphasis>, which has
opportunities for great power and wizardry as well as grande
disaster.
</para>
<para>
The following excerpt of a <emphasis>push</emphasis> stylesheet
<emphasis>might</emphasis>
- be a good idea according to your strict control of the &xml;
+ be a good idea according to your strict control of the &acro.xml;
input format (due to rigorous checking against well-defined and
- tight RelaxNG or &xml; Schema's, for example):
+ tight RelaxNG or &acro.xml; Schema's, for example):
<screen>
<![CDATA[
<xsl:template name="element-name-indexes">
]]>
</screen>
This template creates indexes which have the name of the working
- node of any input &xml; file, and assigns a '1' to the index.
+ node of any input &acro.xml; file, and assigns a '1' to the index.
The example query
<literal>find @attr 1=xyz 1</literal>
finds all files which contain at least one
- <literal>xyz</literal> &xml; element. In case you can not control
+ <literal>xyz</literal> &acro.xml; element. In case you can not control
which element names the input files contain, you might ask for
disaster and bad karma using this technique.
</para>
<title>ALVIS Exchange Formats</title>
<para>
An exchange format can be anything which can be the outcome of an
- &xslt; transformation, as far as the stylesheet is registered in
- the main Alvis &xslt; filter configuration file, see
+ &acro.xslt; transformation, as far as the stylesheet is registered in
+ the main Alvis &acro.xslt; filter configuration file, see
<xref linkend="record-model-alvisxslt-filter"/>.
- In principle anything that can be expressed in &xml;, HTML, and
+ In principle anything that can be expressed in &acro.xml;, HTML, and
TEXT can be the output of a <literal>schema</literal> or
<literal>element set</literal> directive during search, as long as
the information comes from the
- <emphasis>original input record &xml; &dom; tree</emphasis>
- (and not the transformed and <emphasis>indexed</emphasis> &xml;!!).
+ <emphasis>original input record &acro.xml; &acro.dom; tree</emphasis>
+ (and not the transformed and <emphasis>indexed</emphasis> &acro.xml;!!).
</para>
<para>
In addition, internal administrative information from the &zebra;
</section>
<section id="record-model-alvisxslt-example">
- <title>ALVIS Filter &oai; Indexing Example</title>
+ <title>ALVIS Filter &acro.oai; Indexing Example</title>
<para>
The source code tarball contains a working Alvis filter example in
the directory <filename>examples/alvis-oai/</filename>, which
should get you started.
</para>
<para>
- More example data can be harvested from any &oai; compliant server,
- see details at the &oai;
+ More example data can be harvested from any &acro.oai; compliant server,
+ see details at the &acro.oai;
<ulink url="http://www.openarchives.org/">
http://www.openarchives.org/</ulink> web site, and the community
links at
<chapter id="record-model-domxml">
- <!-- $Id: recordmodel-domxml.xml,v 1.13 2007-03-21 19:37:00 adam Exp $ -->
- <title>&dom; &xml; Record Model and Filter Module</title>
+ <!-- $Id: recordmodel-domxml.xml,v 1.14 2007-05-24 13:44:09 adam Exp $ -->
+ <title>&acro.dom; &acro.xml; Record Model and Filter Module</title>
<para>
The record model described in this chapter applies to the fundamental,
- structured &xml;
- record type <literal>&dom;</literal>, introduced in
- <xref linkend="componentmodulesdom"/>. The &dom; &xml; record model
- is experimental, and it's inner workings might change in future
+ structured &acro.xml;
+ record type <literal>&acro.dom;</literal>, introduced in
+ <xref linkend="componentmodulesdom"/>. The &acro.dom; &acro.xml; record model
+ is experimental, and its inner workings might change in future
releases of the &zebra; Information Server.
</para>
<section id="record-model-domxml-filter">
- <title>&dom; Record Filter Architecture</title>
+ <title>&acro.dom; Record Filter Architecture</title>
<para>
- The &dom; &xml; filter uses a standard &dom; &xml; structure as
+ The &acro.dom; &acro.xml; filter uses a standard &acro.dom; &acro.xml; structure as
internal data model, and can therefore parse, index, and display
- any &xml; document type. It is well suited to work on
- standardized &xml;-based formats such as Dublin Core, MODS, METS,
+ any &acro.xml; document type. It is well suited to work on
+ standardized &acro.xml;-based formats such as Dublin Core, MODS, METS,
MARCXML, OAI-PMH, RSS, and performs equally well on any other
- non-standard &xml; format.
+ non-standard &acro.xml; format.
</para>
<para>
- A parser for binary &marc; records based on the ISO2709 library
+ A parser for binary &acro.marc; records based on the ISO2709 library
standard is provided, it transforms these to the internal
- &marcxml; &dom; representation. Other binary document parsers
+ &acro.marcxml; &acro.dom; representation. Other binary document parsers
are planned to follow.
</para>
<para>
- The &dom; filter architecture consists of four
+ The &acro.dom; filter architecture consists of four
different pipelines, each being a chain of arbitrarily many successive
- &xslt; transformations of the internal &dom; &xml;
+ &acro.xslt; transformations of the internal &acro.dom; &acro.xml;
representations of documents.
</para>
<figure id="record-model-domxml-architecture-fig">
- <title>&dom; &xml; filter architecture</title>
+ <title>&acro.dom; &acro.xml; filter architecture</title>
<mediaobject>
<imageobject>
<imagedata fileref="domfilter.pdf" format="PDF" scale="50"/>
<textobject>
<!-- Fall back if none of the images can be used -->
<phrase>
- [Here there should be a diagram showing the &dom; &xml;
+ [Here there should be a diagram showing the &acro.dom; &acro.xml;
filter architecture, but is seems that your
tool chain has not been able to include the diagram in this
document.]
<table id="record-model-domxml-architecture-table" frame="top">
- <title>&dom; &xml; filter pipelines overview</title>
+ <title>&acro.dom; &acro.xml; filter pipelines overview</title>
<tgroup cols="5">
<thead>
<row>
<entry><literal>input</literal></entry>
<entry>first</entry>
<entry>input parsing and initial
- transformations to common &xml; format</entry>
- <entry>Input raw &xml; record buffers, &xml; streams and
- binary &marc; buffers</entry>
- <entry>Common &xml; &dom;</entry>
+ transformations to common &acro.xml; format</entry>
+ <entry>Input raw &acro.xml; record buffers, &acro.xml; streams and
+ binary &acro.marc; buffers</entry>
+ <entry>Common &acro.xml; &acro.dom;</entry>
</row>
<row>
<entry><literal>extract</literal></entry>
<entry>second</entry>
<entry>indexing term extraction
transformations</entry>
- <entry>Common &xml; &dom;</entry>
- <entry>Indexing &xml; &dom;</entry>
+ <entry>Common &acro.xml; &acro.dom;</entry>
+ <entry>Indexing &acro.xml; &acro.dom;</entry>
</row>
<row>
<entry><literal>store</literal></entry>
<entry>second</entry>
<entry> transformations before internal document
storage</entry>
- <entry>Common &xml; &dom;</entry>
- <entry>Storage &xml; &dom;</entry>
+ <entry>Common &acro.xml; &acro.dom;</entry>
+ <entry>Storage &acro.xml; &acro.dom;</entry>
</row>
<row>
<entry><literal>retrieve</literal></entry>
<entry>multiple document retrieve transformations from
storage to different output
formats are possible</entry>
- <entry>Storage &xml; &dom;</entry>
- <entry>Output &xml; syntax in requested formats</entry>
+ <entry>Storage &acro.xml; &acro.dom;</entry>
+ <entry>Output &acro.xml; syntax in requested formats</entry>
</row>
</tbody>
</tgroup>
</table>
<para>
- The &dom; &xml; filter pipelines use &xslt; (and if supported on
- your platform, even &exslt;), it brings thus full &xpath;
+ The &acro.dom; &acro.xml; filter pipelines use &acro.xslt; (and if supported on
+ your platform, even &acro.exslt;), it brings thus full &acro.xpath;
support to the indexing, storage and display rules of not only
- &xml; documents, but also binary &marc; records.
+ &acro.xml; documents, but also binary &acro.marc; records.
</para>
</section>
<section id="record-model-domxml-pipeline">
- <title>&dom; &xml; filter pipeline configuration</title>
+ <title>&acro.dom; &acro.xml; filter pipeline configuration</title>
<para>
- The experimental, loadable &dom; &xml;/&xslt; filter module
+ The experimental, loadable &acro.dom; &acro.xml;/&acro.xslt; filter module
<literal>mod-dom.so</literal>
is invoked by the <filename>zebra.cfg</filename> configuration statement
<screen>
recordtype.xml: dom.db/filter_dom_conf.xml
</screen>
- In this example the &dom; &xml; filter is configured to work
+ In this example the &acro.dom; &acro.xml; filter is configured to work
on all data files with suffix
<filename>*.xml</filename>, where the configuration file is found in the
path <filename>db/filter_dom_conf.xml</filename>.
</para>
- <para>The &dom; &xslt; filter configuration file must be
- valid &xml;. It might look like this:
+ <para>The &acro.dom; &acro.xslt; filter configuration file must be
+ valid &acro.xml;. It might look like this:
<screen>
<![CDATA[
<?xml version="1.0" encoding="UTF8"?>
</screen>
</para>
<para>
- The root &xml; element <literal><dom></literal> and all other &dom;
- &xml; filter elements are residing in the namespace
+ The root &acro.xml; element <literal><dom></literal> and all other &acro.dom;
+ &acro.xml; filter elements are residing in the namespace
<literal>xmlns="http://indexdata.dk/zebra-2.0"</literal>.
</para>
<para>
<para>
All pipeline definition elements may contain zero or more
<literal><![CDATA[<xslt stylesheet="path/file.xsl"/>]]></literal>
- &xslt; transformation instructions, which are performed
+ &acro.xslt; transformation instructions, which are performed
sequentially from top to bottom.
The paths in the <literal>stylesheet</literal> attributes
are relative to zebras working directory, or absolute to the file
<title>Input pipeline</title>
<para>
The <literal><input></literal> pipeline definition element
- may contain either one &xml; Reader definition
+ may contain either one &acro.xml; Reader definition
<literal><![CDATA[<xmlreader level="1"/>]]></literal>, used to split
- an &xml; collection input stream into individual &xml; &dom;
+ an &acro.xml; collection input stream into individual &acro.xml; &acro.dom;
documents at the prescribed element level,
- or one &marc; binary
+ or one &acro.marc; binary
parsing instruction
<literal><![CDATA[<marc inputcharset="marc-8"/>]]></literal>, which defines
- a conversion to &marcxml; format &dom; trees. The allowed values
+ a conversion to &acro.marcxml; format &acro.dom; trees. The allowed values
of the <literal>inputcharset</literal> attribute depend on your
local <productname>iconv</productname> set-up.
</para>
<para>
- Both input parsers deliver individual &dom; &xml; documents to the
+ Both input parsers deliver individual &acro.dom; &acro.xml; documents to the
following chain of zero or more
<literal><![CDATA[<xslt stylesheet="path/file.xsl"/>]]></literal>
- &xslt; transformations. At the end of this pipeline, the documents
+ &acro.xslt; transformations. At the end of this pipeline, the documents
are in the common format, used to feed both the
<literal><extract></literal> and
<literal><store></literal> pipelines.
<title>Extract pipeline</title>
<para>
The <literal><extract></literal> pipeline takes documents
- from any common &dom; &xml; format to the &zebra; specific
- indexing &dom; &xml; format.
+ from any common &acro.dom; &acro.xml; format to the &zebra; specific
+ indexing &acro.dom; &acro.xml; format.
It may consist of zero ore more
<literal><![CDATA[<xslt stylesheet="path/file.xsl"/>]]></literal>
- &xslt; transformations, and the outcome is handled to the
+ &acro.xslt; transformations, and the outcome is handled to the
&zebra; core to drive the process of building the inverted
indexes. See
<xref linkend="record-model-domxml-canonical-index"/> for
<section id="record-model-domxml-pipeline-store">
<title>Store pipeline</title>
The <literal><store></literal> pipeline takes documents
- from any common &dom; &xml; format to the &zebra; specific
- storage &dom; &xml; format.
+ from any common &acro.dom; &acro.xml; format to the &zebra; specific
+ storage &acro.dom; &acro.xml; format.
It may consist of zero ore more
<literal><![CDATA[<xslt stylesheet="path/file.xsl"/>]]></literal>
- &xslt; transformations, and the outcome is handled to the
+ &acro.xslt; transformations, and the outcome is handled to the
&zebra; core for deposition into the internal storage system.
</section>
<literal><retrieve></literal> pipeline definitions, each
of them again consisting of zero or more
<literal><![CDATA[<xslt stylesheet="path/file.xsl"/>]]></literal>
- &xslt; transformations. These are used for document
- presentation after search, and take the internal storage &dom;
- &xml; to the requested output formats during record present
+ &acro.xslt; transformations. These are used for document
+ presentation after search, and take the internal storage &acro.dom;
+ &acro.xml; to the requested output formats during record present
requests.
</para>
<para>
are distinguished by their unique <literal>name</literal>
attributes, these are the literal <literal>schema</literal> or
<literal>element set</literal> names used in
- <ulink url="http://www.loc.gov/standards/sru/srw/">&srw;</ulink>,
- <ulink url="&url.sru;">&sru;</ulink> and
- &z3950; protocol queries.
+ <ulink url="http://www.loc.gov/standards/sru/srw/">&acro.srw;</ulink>,
+ <ulink url="&url.sru;">&acro.sru;</ulink> and
+ &acro.z3950; protocol queries.
</para>
</section>
<title>Canonical Indexing Format</title>
<para>
- &dom; &xml; indexing comes in two flavors: pure
- processing-instruction governed plain &xml; documents, and - very
- similar to the Alvis filter indexing format - &xml; documents
- containing &xml; <literal><record></literal> and
+ &acro.dom; &acro.xml; indexing comes in two flavors: pure
+ processing-instruction governed plain &acro.xml; documents, and - very
+ similar to the Alvis filter indexing format - &acro.xml; documents
+ containing &acro.xml; <literal><record></literal> and
<literal><index></literal> instructions from the magic
namespace <literal>xmlns:z="http://indexdata.dk/zebra-2.0"</literal>.
</para>
<title>Processing-instruction governed indexing format</title>
<para>The output of the processing instruction driven
- indexing &xslt; stylesheets must contain
+ indexing &acro.xslt; stylesheets must contain
processing instructions named
<literal>zebra-2.0</literal>.
- The output of the &xslt; indexing transformation is then
- parsed using &dom; methods, and the contained instructions are
+ The output of the &acro.xslt; indexing transformation is then
+ parsed using &acro.dom; methods, and the contained instructions are
performed on the <emphasis>elements and their
subtrees directly following the processing instructions</emphasis>.
</para>
<section id="record-model-domxml-canonical-index-element">
<title>Magic element governed indexing format</title>
- <para>The output of the indexing &xslt; stylesheets must contain
+ <para>The output of the indexing &acro.xslt; stylesheets must contain
certain elements in the magic
<literal>xmlns:z="http://indexdata.dk/zebra-2.0"</literal>
- namespace. The output of the &xslt; indexing transformation is then
- parsed using &dom; methods, and the contained instructions are
+ namespace. The output of the &acro.xslt; indexing transformation is then
+ parsed using &acro.dom; methods, and the contained instructions are
performed on the <emphasis>magic elements and their
subtrees</emphasis>.
</para>
</listitem>
<listitem>
<para>
- &dom; input documents which are not resulting in both one
+ &acro.dom; input documents which are not resulting in both one
unique valid
<literal>record</literal> instruction and one or more valid
<literal>index</literal> instructions can not be searched and
</para>
<para>The examples work as follows:
- From the original &xml; file
- <literal>marc-one.xml</literal> (or from the &xml; record &dom; of the
+ From the original &acro.xml; file
+ <literal>marc-one.xml</literal> (or from the &acro.xml; record &acro.dom; of the
same form coming from an <literal><input></literal>
pipeline),
the indexing
pipeline <literal><extract></literal>
- produces an indexing &xml; record, which is defined by
+ produces an indexing &acro.xml; record, which is defined by
the <literal>record</literal> instruction
&zebra; uses the content of
<literal>z:id="11224466"</literal>
inserted in the named indexes.
</para>
<para>
- Finally, this example configuration can be queried using &pqf;
- queries, either transported by &z3950;, (here using a yaz-client)
+ Finally, this example configuration can be queried using &acro.pqf;
+ queries, either transported by &acro.z3950;, (here using a yaz-client)
<screen>
<![CDATA[
Z> open localhost:9999
or the proprietary
extensions <literal>x-pquery</literal> and
<literal>x-pScanClause</literal> to
- &sru;, and &srw;
+ &acro.sru;, and &acro.srw;
<screen>
<![CDATA[
http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=@attr 1=title program
http://localhost:9999/?version=1.1&operation=scan&x-pScanClause=@attr 1=title ""
]]>
</screen>
- See <xref linkend="zebrasrv-sru"/> for more information on &sru;/&srw;
+ See <xref linkend="zebrasrv-sru"/> for more information on &acro.sru;/&acro.srw;
configuration, and <xref linkend="gfs-config"/> or the &yaz;
- <ulink url="&url.yaz.cql;">&cql; section</ulink>
+ <ulink url="&url.yaz.cql;">&acro.cql; section</ulink>
for the details or the &yaz; frontend server.
</para>
<para>
Notice that there are no <filename>*.abs</filename>,
- <filename>*.est</filename>, <filename>*.map</filename>, or other &grs1;
+ <filename>*.est</filename>, <filename>*.map</filename>, or other &acro.grs1;
filter configuration files involves in this process, and that the
literal index names are used during search and retrieval.
</para>
<para>
In case that we want to support the usual
- <literal>bib-1</literal> &z3950; numeric access points, it is a
+ <literal>bib-1</literal> &acro.z3950; numeric access points, it is a
good idea to choose string index names defined in the default
configuration file <filename>tab/bib1.att</filename>, see
<xref linkend="attset-files"/>
<section id="record-model-domxml-conf">
- <title>&dom; Record Model Configuration</title>
+ <title>&acro.dom; Record Model Configuration</title>
<section id="record-model-domxml-index">
- <title>&dom; Indexing Configuration</title>
+ <title>&acro.dom; Indexing Configuration</title>
<para>
As mentioned above, there can be only one indexing pipeline,
and configuration of the indexing process is a synonym
- of writing an &xslt; stylesheet which produces &xml; output containing the
+ of writing an &acro.xslt; stylesheet which produces &acro.xml; output containing the
magic processing instructions or elements discussed in
<xref linkend="record-model-domxml-canonical-index"/>.
Obviously, there are million of different ways to accomplish this
<para>
Stylesheets can be written in the <emphasis>pull</emphasis> or
the <emphasis>push</emphasis> style: <emphasis>pull</emphasis>
- means that the output &xml; structure is taken as starting point of
- the internal structure of the &xslt; stylesheet, and portions of
- the input &xml; are <emphasis>pulled</emphasis> out and inserted
- into the right spots of the output &xml; structure.
+ means that the output &acro.xml; structure is taken as starting point of
+ the internal structure of the &acro.xslt; stylesheet, and portions of
+ the input &acro.xml; are <emphasis>pulled</emphasis> out and inserted
+ into the right spots of the output &acro.xml; structure.
On the other
- side, <emphasis>push</emphasis> &xslt; stylesheets are recursively
+ side, <emphasis>push</emphasis> &acro.xslt; stylesheets are recursively
calling their template definitions, a process which is commanded
- by the input &xml; structure, and is triggered to produce
- some output &xml;
+ by the input &acro.xml; structure, and is triggered to produce
+ some output &acro.xml;
whenever some special conditions in the input stylesheets are
met. The <emphasis>pull</emphasis> type is well-suited for input
- &xml; with strong and well-defined structure and semantics, like the
- following &oai; indexing example, whereas the
+ &acro.xml; with strong and well-defined structure and semantics, like the
+ following &acro.oai; indexing example, whereas the
<emphasis>push</emphasis> type might be the only possible way to
- sort out deeply recursive input &xml; formats.
+ sort out deeply recursive input &acro.xml; formats.
</para>
<para>
A <emphasis>pull</emphasis> stylesheet example used to index
- &oai; harvested records could use some of the following template
+ &acro.oai; harvested records could use some of the following template
definitions:
<screen>
<![CDATA[
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:z="http://indexdata.dk/zebra-2.0"
- xmlns:oai="http://www.openarchives.org/&oai;/2.0/"
- xmlns:oai_dc="http://www.openarchives.org/&oai;/2.0/oai_dc/"
+ xmlns:oai="http://www.openarchives.org/&acro.oai;/2.0/"
+ xmlns:oai_dc="http://www.openarchives.org/&acro.oai;/2.0/oai_dc/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
version="1.0">
<section id="record-model-domxml-index-marc">
- <title>&dom; Indexing &marcxml;</title>
+ <title>&acro.dom; Indexing &acro.marcxml;</title>
<para>
- The &dom; filter allows indexing of both binary &marc; records
- and &marcxml; records, depending on it's configuration.
- A typical &marcxml; record might look like this:
+ The &acro.dom; filter allows indexing of both binary &acro.marc; records
+ and &acro.marcxml; records, depending on its configuration.
+ A typical &acro.marcxml; record might look like this:
<screen>
<![CDATA[
<record xmlns="http://www.loc.gov/MARC21/slim">
</para>
<para>
- It is easily possible to make string manipulation in the &dom;
+ It is easily possible to make string manipulation in the &acro.dom;
filter. For example, if you want to drop some leading articles
in the indexing of sort fields, you might want to pick out the
- &marcxml; indicator attributes to chop of leading substrings. If
- the above &xml; example would have an indicator
+ &acro.marcxml; indicator attributes to chop of leading substrings. If
+ the above &acro.xml; example would have an indicator
<literal>ind2="8"</literal> in the title field
<literal>245</literal>, i.e.
<screen>
</xsl:template>
]]>
</screen>
- The output of the above &marcxml; and &xslt; excerpt would then be:
+ The output of the above &acro.marcxml; and &acro.xslt; excerpt would then be:
<screen>
<![CDATA[
<z:index name="title:w title:p any:w">How to program a computer</z:index>
<section id="record-model-domxml-index-wizzard">
- <title>&dom; Indexing Wizardry</title>
+ <title>&acro.dom; Indexing Wizardry</title>
<para>
The names and types of the indexes can be defined in the
- indexing &xslt; stylesheet <emphasis>dynamically according to
- content in the original &xml; records</emphasis>, which has
+ indexing &acro.xslt; stylesheet <emphasis>dynamically according to
+ content in the original &acro.xml; records</emphasis>, which has
opportunities for great power and wizardry as well as grande
disaster.
</para>
<para>
The following excerpt of a <emphasis>push</emphasis> stylesheet
<emphasis>might</emphasis>
- be a good idea according to your strict control of the &xml;
+ be a good idea according to your strict control of the &acro.xml;
input format (due to rigorous checking against well-defined and
- tight RelaxNG or &xml; Schema's, for example):
+ tight RelaxNG or &acro.xml; Schema's, for example):
<screen>
<![CDATA[
<xsl:template name="element-name-indexes">
]]>
</screen>
This template creates indexes which have the name of the working
- node of any input &xml; file, and assigns a '1' to the index.
+ node of any input &acro.xml; file, and assigns a '1' to the index.
The example query
<literal>find @attr 1=xyz 1</literal>
finds all files which contain at least one
- <literal>xyz</literal> &xml; element. In case you can not control
+ <literal>xyz</literal> &acro.xml; element. In case you can not control
which element names the input files contain, you might ask for
disaster and bad karma using this technique.
</para>
]]>
</screen>
Don't be tempted to play too smart tricks with the power of
- &xslt;, the above example will create zillions of
+ &acro.xslt;, the above example will create zillions of
indexes with unpredictable names, resulting in severe &zebra;
index pollution..
</para>
</section>
<section id="record-model-domxml-debug">
- <title>Debuggig &dom; Filter Configurations</title>
+ <title>Debuggig &acro.dom; Filter Configurations</title>
<para>
- It can be very hard to debug a &dom; filter setup due to the many
- sucessive &marc; syntax translations, &xml; stream splitting and
- &xslt; transformations involved. As an aid, you have always the
+ It can be very hard to debug a &acro.dom; filter setup due to the many
+ sucessive &acro.marc; syntax translations, &acro.xml; stream splitting and
+ &acro.xslt; transformations involved. As an aid, you have always the
power of the <literal>-s</literal> command line switch to the
<literal>zebraidz</literal> indexing command at your hand:
<screen>
<!--
<section id="record-model-domxml-elementset">
- <title>&dom; Exchange Formats</title>
+ <title>&acro.dom; Exchange Formats</title>
<para>
An exchange format can be anything which can be the outcome of an
- &xslt; transformation, as far as the stylesheet is registered in
- the main &dom; &xslt; filter configuration file, see
+ &acro.xslt; transformation, as far as the stylesheet is registered in
+ the main &acro.dom; &acro.xslt; filter configuration file, see
<xref linkend="record-model-domxml-filter"/>.
- In principle anything that can be expressed in &xml;, HTML, and
+ In principle anything that can be expressed in &acro.xml;, HTML, and
TEXT can be the output of a <literal>schema</literal> or
<literal>element set</literal> directive during search, as long as
the information comes from the
- <emphasis>original input record &xml; &dom; tree</emphasis>
- (and not the transformed and <emphasis>indexed</emphasis> &xml;!!).
+ <emphasis>original input record &acro.xml; &acro.dom; tree</emphasis>
+ (and not the transformed and <emphasis>indexed</emphasis> &acro.xml;!!).
</para>
<para>
In addition, internal administrative information from the &zebra;
<!--
<section id="record-model-domxml-example">
- <title>&dom; Filter &oai; Indexing Example</title>
+ <title>&acro.dom; Filter &acro.oai; Indexing Example</title>
<para>
- The source code tarball contains a working &dom; filter example in
+ The source code tarball contains a working &acro.dom; filter example in
the directory <filename>examples/dom-oai/</filename>, which
should get you started.
</para>
<para>
- More example data can be harvested from any &oai; compliant server,
- see details at the &oai;
+ More example data can be harvested from any &acro.oai; compliant server,
+ see details at the &acro.oai;
<ulink url="http://www.openarchives.org/">
http://www.openarchives.org/</ulink> web site, and the community
links at
<chapter id="grs">
- <!-- $Id: recordmodel-grs.xml,v 1.8 2007-02-20 14:28:31 marc Exp $ -->
- <title>&grs1; Record Model and Filter Modules</title>
+ <!-- $Id: recordmodel-grs.xml,v 1.9 2007-05-24 13:44:09 adam Exp $ -->
+ <title>&acro.grs1; Record Model and Filter Modules</title>
<note>
<para>
The functionality of this record model has been improved and
- replaced by the DOM &xml; record model. See
+ replaced by the DOM &acro.xml; record model. See
<xref linkend="record-model-domxml"/>.
</para>
</note>
<section id="grs-filters">
- <title>&grs1; Record Filters</title>
+ <title>&acro.grs1; Record Filters</title>
<para>
Many basic subtypes of the <emphasis>grs</emphasis> type are
currently available:
<para>
This is the canonical input format
described <xref linkend="grs-canonical-format"/>. It is using
- simple &sgml;-like syntax.
+ simple &acro.sgml;-like syntax.
</para>
</listitem>
</varlistentry>
<listitem>
<para>
This allows &zebra; to read
- records in the ISO2709 (&marc;) encoding standard.
+ records in the ISO2709 (&acro.marc;) encoding standard.
Last parameter <replaceable>type</replaceable> names the
<literal>.abs</literal> file (see below)
- which describes the specific &marc; structure of the input record as
+ which describes the specific &acro.marc; structure of the input record as
well as the indexing rules.
</para>
<para>The <literal>grs.marc</literal> uses an internal represtantion
- which is not &xml; conformant. In particular &marc; tags are
- presented as elements with the same name. And &xml; elements
+ which is not &acro.xml; conformant. In particular &acro.marc; tags are
+ presented as elements with the same name. And &acro.xml; elements
may not start with digits. Therefore this filter is only
- suitable for systems returning &grs1; and &marc; records. For &xml;
+ suitable for systems returning &acro.grs1; and &acro.marc; records. For &acro.xml;
use <literal>grs.marcxml</literal> filter instead (see below).
</para>
<para>
This allows &zebra; to read ISO2709 encoded records.
Last parameter <replaceable>type</replaceable> names the
<literal>.abs</literal> file (see below)
- which describes the specific &marc; structure of the input record as
+ which describes the specific &acro.marc; structure of the input record as
well as the indexing rules.
</para>
<para>
The internal representation for <literal>grs.marcxml</literal>
- is the same as for <ulink url="&url.marcxml;">&marcxml;</ulink>.
+ is the same as for <ulink url="&url.marcxml;">&acro.marcxml;</ulink>.
It slightly more complicated to work with than
- <literal>grs.marc</literal> but &xml; conformant.
+ <literal>grs.marc</literal> but &acro.xml; conformant.
</para>
<para>
The loadable <literal>grs.marcxml</literal> filter module
<term><literal>grs.xml</literal></term>
<listitem>
<para>
- This filter reads &xml; records and uses
+ This filter reads &acro.xml; records and uses
<ulink url="http://expat.sourceforge.net/">Expat</ulink> to
parse them and convert them into ID&zebra;'s internal
<literal>grs</literal> record model.
- Only one record per file is supported, due to the fact &xml; does
+ Only one record per file is supported, due to the fact &acro.xml; does
not allow two documents to "follow" each other (there is no way
to know when a document is finished).
This filter is only available if &zebra; is compiled with EXPAT support.
</para>
<section id="grs-canonical-format">
- <title>&grs1; Canonical Input Format</title>
+ <title>&acro.grs1; Canonical Input Format</title>
<para>
Although input data can take any form, it is sometimes useful to
describe the record processing capabilities of the system in terms of
a single, canonical input format that gives access to the full
spectrum of structure and flexibility in the system. In &zebra;, this
- canonical format is an "&sgml;-like" syntax.
+ canonical format is an "&acro.sgml;-like" syntax.
</para>
<para>
contains only a single element (strictly speaking, that makes it an
illegal GILS record, since the GILS profile includes several mandatory
elements - &zebra; does not validate the contents of a record against
- the &z3950; profile, however - it merely attempts to match up elements
+ the &acro.z3950; profile, however - it merely attempts to match up elements
of a local representation with the given schema):
</para>
textual data elements which might appear in different languages, and
images which may appear in different formats or layouts.
The variant system in &zebra; is essentially a representation of
- the variant mechanism of &z3950;-1995.
+ the variant mechanism of &acro.z3950;-1995.
</para>
<para>
<para>
The title element above comes in two variants. Both have the IANA body
type "text/plain", but one is in English, and the other in
- Danish. The client, using the element selection mechanism of &z3950;,
+ Danish. The client, using the element selection mechanism of &acro.z3950;,
can retrieve information about the available variant forms of data
elements, or it can select specific variants based on the requirements
of the end-user.
</section>
<section id="grs-regx-tcl">
- <title>&grs1; REGX And TCL Input Filters</title>
+ <title>&acro.grs1; REGX And TCL Input Filters</title>
<para>
In order to handle general input formats, &zebra; allows the
</section>
<section id="grs-internal-representation">
- <title>&grs1; Internal Record Representation</title>
+ <title>&acro.grs1; Internal Record Representation</title>
<para>
When records are manipulated by the system, they're represented in a
<para>
In practice, each variant node is associated with a triple of class,
- type, value, corresponding to the variant mechanism of &z3950;.
+ type, value, corresponding to the variant mechanism of &acro.z3950;.
</para>
</section>
</section>
<section id="grs-conf">
- <title>&grs1; Record Model Configuration</title>
+ <title>&acro.grs1; Record Model Configuration</title>
<para>
The following sections describe the configuration files that govern
<listitem>
<para>
- The object identifier of the &z3950; schema associated
+ The object identifier of the &acro.z3950; schema associated
with the ARS, so that it can be referred to by the client.
</para>
</listitem>
ask for a subset of the data elements contained in a record. Element
set names, in the retrieval module, are mapped to <emphasis>element
specifications</emphasis>, which contain information equivalent to the
- <emphasis>Espec-1</emphasis> syntax of &z3950;.
+ <emphasis>Espec-1</emphasis> syntax of &acro.z3950;.
</para>
</listitem>
<listitem>
<para>
Possibly, a set of rules describing the mapping of elements to a
- &marc; representation.
+ &acro.marc; representation.
</para>
</listitem>
<listitem>
<para>
A list of element descriptions (this is the actual ARS of the
- schema, in &z3950; terms), which lists the ways in which the various
+ schema, in &acro.z3950; terms), which lists the ways in which the various
tags can be used and organized hierarchically.
</para>
</listitem>
<para>
The number of different file types may appear daunting at first, but
- each type corresponds fairly clearly to a single aspect of the &z3950;
+ each type corresponds fairly clearly to a single aspect of the &acro.z3950;
retrieval facilities. Further, the average database administrator,
who is simply reusing an existing profile for which tables already
exist, shouldn't have to worry too much about the contents of these tables.
<title>The Abstract Syntax (.abs) Files</title>
<para>
- The name of this file type is slightly misleading in &z3950; terms,
+ The name of this file type is slightly misleading in &acro.z3950; terms,
since, apart from the actual abstract syntax of the profile, it also
includes most of the other definitions that go into a database
profile.
</para>
<para>
- When a record in the canonical, &sgml;-like format is read from a file
+ When a record in the canonical, &acro.sgml;-like format is read from a file
or from the database, the first tag of the file should reference the
profile that governs the layout of the record. If the first tag of the
record is, say, <literal><gils></literal>, the system will look
<para>
(o) Points to a file containing parameters
for representing the record contents in the ISO2709 syntax.
- Read the description of the &marc; representation facility below.
+ Read the description of the &acro.marc; representation facility below.
</para>
</listitem>
</varlistentry>
<para>
(o,r) Adds an element to the abstract record syntax of the schema.
The <replaceable>path</replaceable> follows the
- syntax which is suggested by the &z3950; document - that is, a sequence
+ syntax which is suggested by the &acro.z3950; document - that is, a sequence
of tags separated by slashes (/). Each tag is given as a
comma-separated pair of tag type and -value surrounded by parenthesis.
The <replaceable>name</replaceable> is the name of the element, and
<term>melm <replaceable>field$subfield attributes</replaceable></term>
<listitem>
<para>
- This directive is specifically for &marc;-formatted records,
- ingested either in the form of &marcxml; documents, or in the
+ This directive is specifically for &acro.marc;-formatted records,
+ ingested either in the form of &acro.marcxml; documents, or in the
ISO2709/Z39.2 format using the grs.marcxml input filter. You can
specify indexing rules for any subfield, or you can leave off the
<replaceable>$subfield</replaceable> part and specify default rules
<listitem>
<para>
This directive specifies character encoding for external records.
- For records such as &xml; that specifies encoding within the
+ For records such as &acro.xml; that specifies encoding within the
file via a header this directive is ignored.
If neither this directive is given, nor an encoding is set
within external records, ISO-8859-1 encoding is assumed.
An automatically generated identifier for the record,
unique within this database. It is represented by the
<literal><localControlNumber></literal> element in
- &xml; and the <literal>(1,14)</literal> tag in &grs1;.
+ &acro.xml; and the <literal>(1,14)</literal> tag in &acro.grs1;.
</para></listitem>
</varlistentry>
<varlistentry>
set. For instance, many new attribute sets are defined as extensions
to the <replaceable>bib-1</replaceable> set.
This is an important feature of the retrieval
- system of &z3950;, as it ensures the highest possible level of
+ system of &acro.z3950;, as it ensures the highest possible level of
interoperability, as those access points of your database which are
derived from the external set (say, bib-1) can be used even by clients
who are unaware of the new set.
<para>
This file type defines the tagset of the profile, possibly by
referencing other tag sets (most tag sets, for instance, will include
- tagsetG and tagsetM from the &z3950; specification. The file may
+ tagsetG and tagsetM from the &acro.z3950; specification. The file may
contain the following directives.
</para>
The element set specification files describe a selection of a subset
of the elements of a database record. The element selection mechanism
is equivalent to the one supplied by the <emphasis>Espec-1</emphasis>
- syntax of the &z3950; specification.
+ syntax of the &acro.z3950; specification.
In fact, the internal representation of an element set
specification is identical to the <emphasis>Espec-1</emphasis> structure,
and we'll refer you to the description of that structure for most of
a schema that differs from the native schema of the record. For
instance, a client might only know how to process WAIS records, while
the database record is represented in a more specific schema, such as
- GILS. In this module, a mapping of data to one of the &marc; formats is
+ GILS. In this module, a mapping of data to one of the &acro.marc; formats is
also thought of as a schema mapping (mapping the elements of the
- record into fields consistent with the given &marc; specification, prior
+ record into fields consistent with the given &acro.marc; specification, prior
to actually converting the data to the ISO2709). This use of the
- object identifier for &usmarc; as a schema identifier represents an
+ object identifier for &acro.usmarc; as a schema identifier represents an
overloading of the OID which might not be entirely proper. However,
it represents the dual role of schema and record syntax which
- is assumed by the &marc; family in &z3950;.
+ is assumed by the &acro.marc; family in &acro.z3950;.
</para>
<!--
</section>
<section id="grs-mar-files">
- <title>The &marc; (ISO2709) Representation (.mar) Files</title>
+ <title>The &acro.marc; (ISO2709) Representation (.mar) Files</title>
<para>
This file provides rules for representing a record in the ISO2709
<!--
NOTE: FIXME! This will be described better. We're in the process of
- re-evaluating and most likely changing the way that &marc; records are
+ re-evaluating and most likely changing the way that &acro.marc; records are
handled by the system.</emphasis>
-->
</section>
<section id="grs-exchange-formats">
- <title>&grs1; Exchange Formats</title>
+ <title>&acro.grs1; Exchange Formats</title>
<para>
Converting records from the internal structure to an exchange format
<itemizedlist>
<listitem>
<para>
- &grs1;. The internal representation is based on &grs1;/&xml;, so the
+ &acro.grs1;. The internal representation is based on &acro.grs1;/&acro.xml;, so the
conversion here is straightforward. The system will create
applied variant and supported variant lists as required, if a record
contains variant information.
<listitem>
<para>
- &xml;. The internal representation is based on &grs1;/&xml; so
- the mapping is trivial. Note that &xml; schemas, preprocessing
+ &acro.xml;. The internal representation is based on &acro.grs1;/&acro.xml; so
+ the mapping is trivial. Note that &acro.xml; schemas, preprocessing
instructions and comments are not part of the internal representation
- and therefore will never be part of a generated &xml; record.
+ and therefore will never be part of a generated &acro.xml; record.
Future versions of the &zebra; will support that.
</para>
</listitem>
<listitem>
<para>
- &sutrs;. Again, the mapping is fairly straightforward. Indentation
+ &acro.sutrs;. Again, the mapping is fairly straightforward. Indentation
is used to show the hierarchical structure of the record. All
- "&grs1;" type records support both the &grs1; and &sutrs;
+ "&acro.grs1;" type records support both the &acro.grs1; and &acro.sutrs;
representations.
- <!-- FIXME - What is &sutrs; - should be expanded here -->
+ <!-- FIXME - What is &acro.sutrs; - should be expanded here -->
</para>
</listitem>
<listitem>
<para>
- ISO2709-based formats (&usmarc;, etc.). Only records with a
+ ISO2709-based formats (&acro.usmarc;, etc.). Only records with a
two-level structure (corresponding to fields and subfields) can be
directly mapped to ISO2709. For records with a different structuring
- (eg., GILS), the representation in a structure like &usmarc; involves a
+ (eg., GILS), the representation in a structure like &acro.usmarc; involves a
schema-mapping (see <xref linkend="schema-mapping"/>), to an
- "implied" &usmarc; schema (implied,
+ "implied" &acro.usmarc; schema (implied,
because there is no formal schema which specifies the use of the
- &usmarc; fields outside of ISO2709). The resultant, two-level record is
+ &acro.usmarc; fields outside of ISO2709). The resultant, two-level record is
then mapped directly from the internal representation to ISO2709. See
the GILS schema definition files for a detailed example of this
approach.
</section>
<section id="grs-extended-marc-indexing">
- <title>Extended indexing of &marc; records</title>
+ <title>Extended indexing of &acro.marc; records</title>
- <para>Extended indexing of &marc; records will help you if you need index a
+ <para>Extended indexing of &acro.marc; records will help you if you need index a
combination of subfields, or index only a part of the whole field,
- or use during indexing process embedded fields of &marc; record.
+ or use during indexing process embedded fields of &acro.marc; record.
</para>
- <para>Extended indexing of &marc; records additionally allows:
+ <para>Extended indexing of &acro.marc; records additionally allows:
<itemizedlist>
<listitem>
- <para>to index data in LEADER of &marc; record</para>
+ <para>to index data in LEADER of &acro.marc; record</para>
</listitem>
<listitem>
</listitem>
<listitem>
- <para>to index linked fields for UNI&marc; based formats</para>
+ <para>to index linked fields for UNI&acro.marc; based formats</para>
</listitem>
</itemizedlist>
</para>
<note><para>In compare with simple indexing process the extended indexing
- may increase (about 2-3 times) the time of indexing process for &marc;
+ may increase (about 2-3 times) the time of indexing process for &acro.marc;
records.</para></note>
<section id="formula">
<title>The index-formula</title>
<para>At the beginning, we have to define the term
- <emphasis>index-formula</emphasis> for &marc; records. This term helps
- to understand the notation of extended indexing of &marc; records by &zebra;.
+ <emphasis>index-formula</emphasis> for &acro.marc; records. This term helps
+ to understand the notation of extended indexing of &acro.marc; records by &zebra;.
Our definition is based on the document
<ulink url="http://www.rba.ru/rusmarc/soft/Z39-50.htm">"The table
- of conformity for &z3950; use attributes and R&usmarc; fields"</ulink>.
+ of conformity for &acro.z3950; use attributes and R&acro.usmarc; fields"</ulink>.
The document is available only in russian language.</para>
<para>
</screen>
<para>
- We know that &zebra; supports a &bib1; attribute - right truncation.
+ We know that &zebra; supports a &acro.bib1; attribute - right truncation.
In this case, the <emphasis>index-formula</emphasis> (1) consists from
forms, defined in the same way as (1)</para>
</screen>
<note>
- <para>The original &marc; record may be without some elements, which included in <emphasis>index-formula</emphasis>.
+ <para>The original &acro.marc; record may be without some elements, which included in <emphasis>index-formula</emphasis>.
</para>
</note>
<varlistentry>
<term>-</term>
<listitem><para>The position may contain any value, defined by
- &marc; format.
+ &acro.marc; format.
For example, <emphasis>index-formula</emphasis></para>
<screen>
<note>
<para>
- All another operands are the same as accepted in &marc; world.
+ All another operands are the same as accepted in &acro.marc; world.
</para>
</note>
</para>
(<literal>.abs</literal> file). It means that names beginning with
<literal>"mc-"</literal> are interpreted by &zebra; as
<emphasis>index-formula</emphasis>. The database index is created and
- linked with <emphasis>access point</emphasis> (&bib1; use attribute)
+ linked with <emphasis>access point</emphasis> (&acro.bib1; use attribute)
according to this formula.</para>
<para>For example, <emphasis>index-formula</emphasis></para>
<varlistentry>
<term>.</term>
<listitem><para>The position may contain any value, defined by
- &marc; format. For example,
+ &acro.marc; format. For example,
<emphasis>index-formula</emphasis></para>
<screen>
</para>
<note>
- <para>All another operands are the same as accepted in &marc; world.</para>
+ <para>All another operands are the same as accepted in &acro.marc; world.</para>
</note>
<section id="grs-examples">
elm mc-008[0-5] Date/time-added-to-db !
</screen>
- <para>or for R&usmarc; (this data included in 100th field)</para>
+ <para>or for R&acro.usmarc; (this data included in 100th field)</para>
<screen>
elm mc-100___$a[0-7]_ Date/time-added-to-db !
<para>using indicators while indexing</para>
- <para>For R&usmarc; <emphasis>index-formula</emphasis>
+ <para>For R&acro.usmarc; <emphasis>index-formula</emphasis>
<literal>70-#1$a, $g</literal> matches</para>
<screen>
<listitem>
- <para>indexing embedded (linked) fields for UNI&marc; based
+ <para>indexing embedded (linked) fields for UNI&acro.marc; based
formats</para>
- <para>For R&usmarc; <emphasis>index-formula</emphasis>
+ <para>For R&acro.usmarc; <emphasis>index-formula</emphasis>
<literal>4--#-$170-#1$a, $g ($c)</literal> matches</para>
<screen><![CDATA[
<!ENTITY test SYSTEM "test.xml">
]>
-<!-- $Id: zebra.xml,v 1.19 2007-05-22 11:12:53 adam Exp $ -->
+<!-- $Id: zebra.xml,v 1.20 2007-05-24 13:44:09 adam Exp $ -->
<book id="zebra">
<bookinfo>
<title>&zebra; - User's Guide and Reference</title>
<abstract>
<simpara>
&zebra; is a free, fast, friendly information management system. It
- can index records in &xml;, &sgml;, &marc;, e-mail archives and many
+ can index records in &acro.xml;, &acro.sgml;, &acro.marc;, e-mail archives and many
other formats, and quickly find them using a combination of
boolean searching and relevance ranking. Search-and-retrieve
- applications can be written using &api;s in a wide variety of
+ applications can be written using &acro.api;s in a wide variety of
languages, communicating with the &zebra; server using
industry-standard information-retrieval protocols or web services.
</simpara>
<!ENTITY % idcommon SYSTEM "common/common.ent">
%idcommon;
]>
-<!-- $Id: zebraidx.xml,v 1.13 2007-05-22 11:12:53 adam Exp $ -->
+<!-- $Id: zebraidx.xml,v 1.14 2007-05-24 13:44:09 adam Exp $ -->
<refentry id="zebraidx">
<refentryinfo>
<productname>zebra</productname>
<listitem>
<para>
The records located should be associated with the database name
- <replaceable>database</replaceable> for access through the &z3950; server.
+ <replaceable>database</replaceable> for access through the &acro.z3950; server.
</para>
</listitem>
</varlistentry>
<!--
- $Id: zebrasrv-options.xml,v 1.7 2007-02-02 11:10:08 marc Exp $
+ $Id: zebrasrv-options.xml,v 1.8 2007-05-24 13:44:09 adam Exp $
Options for generic frontend server and yaz-ztest.
Included in both manual and man page for yaz-ztest
Note - these files have been altered for zebrasrv, and are not in
<varlistentry><term><literal>-z</literal></term>
<listitem><para>
- Use the &z3950; protocol (default). This option and <literal>-s</literal>
+ Use the &acro.z3950; protocol (default). This option and <literal>-s</literal>
complement each other.
You can use both multiple times on the same command
line, between listener-specifications (see below). This way, you
<varlistentry><term><literal>-f </literal>
<replaceable>vconfig</replaceable></term>
- <listitem><para>This specifies an &xml; file that describes
+ <listitem><para>This specifies an &acro.xml; file that describes
one or more &yaz; frontend virtual servers. See section VIRTUAL
HOSTS for details.
</para></listitem></varlistentry>
<screen>
hostname | IP-number [: portnumber]
</screen>
- The port number defaults to 210 (standard &z3950; port) for
+ The port number defaults to 210 (standard &acro.z3950; port) for
privileged users (root), and 9999 for normal users.
The special hostname "@" is mapped to
the address INADDR_ANY, which causes the server to listen on any local
<para>
The default behavior for <literal>zebrasrv</literal> - if started
as non-priviledged user - is to establish
- a single TCP/IP listener, for the &z3950; protocol, on port 9999.
+ a single TCP/IP listener, for the &acro.z3950; protocol, on port 9999.
<screen>
zebrasrv @
zebrasrv tcp:some.server.name.org:1234
<para>
To start the server listening on the registered port for
- &z3950;, or on a filesystem socket,
+ &acro.z3950;, or on a filesystem socket,
and to drop root privileges once the ports are bound, execute
the server like this from a root shell:
<screen>
<!--
- $Id: zebrasrv-virtual.xml,v 1.9 2007-02-02 11:10:08 marc Exp $
+ $Id: zebrasrv-virtual.xml,v 1.10 2007-05-24 13:44:09 adam Exp $
Description of the virtual host mechanism in &yaz; GFS
Included in both manual and man page for yaz-ztest
-->
</para>
<para>
A backend can be configured to execute in a particular working
- directory. Or the &yaz; frontend may perform <ulink url="&url.cql;">&cql;</ulink> to &rpn; conversion, thus
- allowing traditional &z3950; backends to be offered as a
-<ulink url="&url.sru;">&sru;</ulink> service.
- &sru; Explain information for a particular backend may also be specified.
+ directory. Or the &yaz; frontend may perform <ulink url="&url.cql;">&acro.cql;</ulink> to &acro.rpn; conversion, thus
+ allowing traditional &acro.z3950; backends to be offered as a
+<ulink url="&url.sru;">&acro.sru;</ulink> service.
+ &acro.sru; Explain information for a particular backend may also be specified.
</para>
<para>
For the HTTP protocol, the virtual host is specified in the Host header.
- For the &z3950; protocol, the virtual host is specified as in the
+ For the &acro.z3950; protocol, the virtual host is specified as in the
Initialize Request in the OtherInfo, OID 1.2.840.10003.10.1000.81.1.
</para>
<note>
<para>
- Not all &z3950; clients allows the VHOST information to be set.
+ Not all &acro.z3950; clients allows the VHOST information to be set.
For those the selection of the backend must rely on the
TCP/IP information alone (port and address).
</para>
</note>
<para>
- The &yaz; frontend server uses &xml; to describe the backend
+ The &yaz; frontend server uses &acro.xml; to describe the backend
configurations. Command-line option <literal>-f</literal>
- specifies filename of the &xml; configuration.
+ specifies filename of the &acro.xml; configuration.
</para>
<para>
The configuration uses the root element <literal>yazgfs</literal>.
<varlistentry><term>element <literal>cql2rpn</literal> (optional)</term>
<listitem>
<para>
- Specifies a filename that includes <ulink url="&url.cql;">&cql;</ulink> to &rpn; conversion for this
- backend server. See <ulink url="&url.cql;">&cql;</ulink> section in &yaz; manual.
- If given, the backend server will only "see" a Type-1/&rpn; query.
+ Specifies a filename that includes <ulink url="&url.cql;">&acro.cql;</ulink> to &acro.rpn; conversion for this
+ backend server. See <ulink url="&url.cql;">&acro.cql;</ulink> section in &yaz; manual.
+ If given, the backend server will only "see" a Type-1/&acro.rpn; query.
</para>
</listitem>
</varlistentry>
<varlistentry><term>element <literal>explain</literal> (optional)</term>
<listitem>
<para>
- Specifies <ulink url="&url.sru;">&sru;</ulink> ZeeRex content for this
+ Specifies <ulink url="&url.sru;">&acro.sru;</ulink> ZeeRex content for this
server - copied verbatim to the client.
As things are now, some of the Explain content seems redundant
because host information, etc. is also stored elsewhere.
</para>
<para>
- The &xml; below configures a server that accepts connections from
+ The &acro.xml; below configures a server that accepts connections from
two ports, TCP/IP port 9900 and a local UNIX file socket.
We name the TCP/IP server <literal>public</literal> and the
other server <literal>internal</literal>.
</para>
<para>
For <literal>"server2"</literal> elements for
-<ulink url="&url.cql;">&cql;</ulink> to &rpn; conversion
+<ulink url="&url.cql;">&acro.cql;</ulink> to &acro.rpn; conversion
is supported and explain information has been added (a short one here
to keep the example small).
</para>
<!ENTITY % idcommon SYSTEM "common/common.ent">
%idcommon;
]>
- <!-- $Id: zebrasrv.xml,v 1.5 2007-05-22 11:12:53 adam Exp $ -->
+ <!-- $Id: zebrasrv.xml,v 1.6 2007-05-24 13:44:09 adam Exp $ -->
<refentry id="zebrasrv">
<refentryinfo>
<productname>zebra</productname>
<refsect1><title>DESCRIPTION</title>
<para>Zebra is a high-performance, general-purpose structured text indexing
and retrieval engine. It reads structured records in a variety of input
- formats (eg. email, &xml;, &marc;) and allows access to them through exact
+ formats (eg. email, &acro.xml;, &acro.marc;) and allows access to them through exact
boolean search expressions and relevance-ranked free-text queries.
</para>
<para>
- <command>zebrasrv</command> is the &z3950; and &sru; frontend
+ <command>zebrasrv</command> is the &acro.z3950; and &acro.sru; frontend
server for the <command>Zebra</command> search engine and indexer.
</para>
<para>
</refsect1>
<refsect1 id="protocol-support">
- <title>&z3950; Protocol Support and Behavior</title>
+ <title>&acro.z3950; Protocol Support and Behavior</title>
<refsect2 id="zebrasrv-initialization">
- <title>&z3950; Initialization</title>
+ <title>&acro.z3950; Initialization</title>
<para>
During initialization, the server will negotiate to version 3 of the
- &z3950; protocol, and the option bits for Search, Present, Scan,
+ &acro.z3950; protocol, and the option bits for Search, Present, Scan,
NamedResultSets, and concurrentOperations will be set, if requested by
the client. The maximum PDU size is negotiated down to a maximum of
1 MB by default.
</refsect2>
<refsect2 id="zebrasrv-search">
- <title>&z3950; Search</title>
+ <title>&acro.z3950; Search</title>
<para>
The supported query type are 1 and 101. All operators are currently
</refsect2>
<refsect2 id="zebrasrv-present">
- <title>&z3950; Present</title>
+ <title>&acro.z3950; Present</title>
<para>
The present facility is supported in a standard fashion. The requested
record syntax is matched against the ones supported by the profile of
- each record retrieved. If no record syntax is given, &sutrs; is the
+ each record retrieved. If no record syntax is given, &acro.sutrs; is the
default. The requested element set name, again, is matched against any
provided by the relevant record profiles.
</para>
</refsect2>
<refsect2 id="zebrasrv-scan">
- <title>&z3950; Scan</title>
+ <title>&acro.z3950; Scan</title>
<para>
The attribute combinations provided with the termListAndStartPoint are
processed in the same way as operands in a query (see above).
</para>
</refsect2>
<refsect2 id="zebrasrv-sort">
- <title>&z3950; Sort</title>
+ <title>&acro.z3950; Sort</title>
<para>
- &z3950; specifies three different types of sort criteria.
+ &acro.z3950; specifies three different types of sort criteria.
Of these Zebra supports the attribute specification type in which
case the use attribute specifies the "Sort register".
Sort registers are created for those fields that are of type "sort" in
</para>
<para>
- &z3950; allows the client to specify sorting on one or more input
+ &acro.z3950; allows the client to specify sorting on one or more input
result sets and one output result set.
Zebra supports sorting on one result set only which may or may not
be the same as the output result set.
</para>
</refsect2>
<refsect2 id="zebrasrv-close">
- <title>&z3950; Close</title>
+ <title>&acro.z3950; Close</title>
<para>
If a Close PDU is received, the server will respond with a Close PDU
with reason=FINISHED, no matter which protocol version was negotiated
</refsect2>
<refsect2 id="zebrasrv-explain">
- <title>&z3950; Explain</title>
+ <title>&acro.z3950; Explain</title>
<para>
Zebra maintains a "classic"
- <ulink url="&url.z39.50.explain;">&z3950; Explain</ulink> database
+ <ulink url="&url.z39.50.explain;">&acro.z3950; Explain</ulink> database
on the side.
This database is called <literal>IR-Explain-1</literal> and can be
searched using the attribute set <literal>exp-1</literal>.
</refsect2>
</refsect1>
<refsect1 id="zebrasrv-sru">
- <title>The &sru; Server</title>
+ <title>The &acro.sru; Server</title>
<para>
- In addition to &z3950;, Zebra supports the more recent and
- web-friendly IR protocol <ulink url="&url.sru;">&sru;</ulink>.
- &sru; can be carried over &soap; or a &rest;-like protocol
- that uses HTTP &get; or &post; to request search responses. The request
+ In addition to &acro.z3950;, Zebra supports the more recent and
+ web-friendly IR protocol <ulink url="&url.sru;">&acro.sru;</ulink>.
+ &acro.sru; can be carried over &acro.soap; or a &acro.rest;-like protocol
+ that uses HTTP &acro.get; or &acro.post; to request search responses. The request
itself is made of parameters such as
<literal>query</literal>,
<literal>startRecord</literal>,
<literal>maximumRecords</literal>
and
<literal>recordSchema</literal>;
- the response is an &xml; document containing hit-count, result-set
- records, diagnostics, etc. &sru; can be thought of as a re-casting
- of &z3950; semantics in web-friendly terms; or as a standardisation
+ the response is an &acro.xml; document containing hit-count, result-set
+ records, diagnostics, etc. &acro.sru; can be thought of as a re-casting
+ of &acro.z3950; semantics in web-friendly terms; or as a standardisation
of the ad-hoc query parameters used by search engines such as Google
and AltaVista; or as a superset of A9's OpenSearch (which it
predates).
</para>
<para>
- Zebra supports &z3950;, &sru; &get;, SRU &post;, SRU &soap; (&srw;)
+ Zebra supports &acro.z3950;, &acro.sru; &acro.get;, SRU &acro.post;, SRU &acro.soap; (&acro.srw;)
- on the same port, recognising what protocol is used by each incoming
requests and handling them accordingly. This is a achieved through
the use of Deep Magic; civilians are warned not to stand too close.
</para>
<refsect2 id="zebrasrv-sru-run">
- <title>Running zebrasrv as an &sru; Server</title>
+ <title>Running zebrasrv as an &acro.sru; Server</title>
<para>
Because Zebra supports all protocols on one port, it would
- seem to follow that the &sru; server is run in the same way as
- the &z3950; server, as described above. This is true, but only in
+ seem to follow that the &acro.sru; server is run in the same way as
+ the &acro.z3950; server, as described above. This is true, but only in
an uninterestingly vacuous way: a Zebra server run in this manner
- will indeed recognise and accept &sru; requests; but since it
- doesn't know how to handle the &cql; queries that these protocols
+ will indeed recognise and accept &acro.sru; requests; but since it
+ doesn't know how to handle the &acro.cql; queries that these protocols
use, all it can do is send failure responses.
</para>
<note>
<para>
- It is possible to cheat, by having &sru; search Zebra with
- a &pqf; query instead of &cql;, using the
+ It is possible to cheat, by having &acro.sru; search Zebra with
+ a &acro.pqf; query instead of &acro.cql;, using the
<literal>x-pquery</literal>
parameter instead of
<literal>query</literal>.
This is a
<emphasis role="strong">non-standard extension</emphasis>
- of &cql;, and a
+ of &acro.cql;, and a
<emphasis role="strong">very naughty</emphasis>
- thing to do, but it does give you a way to see Zebra serving &sru;
+ thing to do, but it does give you a way to see Zebra serving &acro.sru;
``right out of the box''. If you start your favourite Zebra
server in the usual way, on port 9999, then you can send your web
browser to:
&maximumRecords=1
</screen>
<para>
- This will display the &xml;-formatted &sru; response that includes the
+ This will display the &acro.xml;-formatted &acro.sru; response that includes the
first record in the result-set found by the query
- <literal>mineral</literal>. (For clarity, the &sru; URL is shown
+ <literal>mineral</literal>. (For clarity, the &acro.sru; URL is shown
here broken across lines, but the lines should be joined to gether
to make single-line URL for the browser to submit.)
</para>
</note>
<para>
- In order to turn on Zebra's support for &cql; queries, it's necessary
+ In order to turn on Zebra's support for &acro.cql; queries, it's necessary
to have the &yaz; generic front-end (which Zebra uses) translate them
- into the &z3950; Type-1 query format that is used internally. And
+ into the &acro.z3950; Type-1 query format that is used internally. And
to do this, the generic front-end's own configuration file must be
used. See <xref linkend="gfs-config"/>;
- the salient point for &sru; support is that
+ the salient point for &acro.sru; support is that
<command>zebrasrv</command>
must be started with the
<literal>-f frontendConfigFile</literal>
<literal>-c zebraConfigFile</literal>
option,
and that the front-end configuration file must include both a
- reference to the Zebra configuration file and the &cql;-to-&pqf;
+ reference to the Zebra configuration file and the &acro.cql;-to-&acro.pqf;
translator configuration file.
</para>
<para>
<literal>-c</literal>
command-line argument, and the
<literal><cql2rpn></literal>
- element contains the name of the &cql; properties file specifying how
- various &cql; indexes, relations, etc. are translated into Type-1
+ element contains the name of the &acro.cql; properties file specifying how
+ various &acro.cql; indexes, relations, etc. are translated into Type-1
queries.
</para>
<para>
A zebra server running with such a configuration can then be
- queried using proper, conformant &sru; URLs with &cql; queries:
+ queried using proper, conformant &acro.sru; URLs with &acro.cql; queries:
</para>
<screen>
http://localhost:9999/Default?version=1.1
</refsect2>
</refsect1>
<refsect1 id="zebrasrv-sru-support">
- <title>&sru; Protocol Support and Behavior</title>
+ <title>&acro.sru; Protocol Support and Behavior</title>
<para>
- Zebra running as an &sru; server supports SRU version 1.1, including
- &cql; version 1.1. In particular, it provides support for the
+ Zebra running as an &acro.sru; server supports SRU version 1.1, including
+ &acro.cql; version 1.1. In particular, it provides support for the
following elements of the protocol.
</para>
<refsect2 id="zebrasrvr-search-and-retrieval">
- <title>&sru; Search and Retrieval</title>
+ <title>&acro.sru; Search and Retrieval</title>
<para>
Zebra supports the
- <ulink url="&url.sru.searchretrieve;">&sru; searchRetrieve</ulink>
+ <ulink url="&url.sru.searchretrieve;">&acro.sru; searchRetrieve</ulink>
operation.
</para>
<para>
- One of the great strengths of &sru; is that it mandates a standard
- query language, &cql;, and that all conforming implementations can
+ One of the great strengths of &acro.sru; is that it mandates a standard
+ query language, &acro.cql;, and that all conforming implementations can
therefore be trusted to correctly interpret the same queries. It
is with some shame, then, that we admit that Zebra also supports
an additional query language, our own Prefix Query Format
- (<ulink url="&url.yaz.pqf;">&pqf;</ulink>).
- A &pqf; query is submitted by using the extension parameter
+ (<ulink url="&url.yaz.pqf;">&acro.pqf;</ulink>).
+ A &acro.pqf; query is submitted by using the extension parameter
<literal>x-pquery</literal>,
in which case the
<literal>query</literal>
- parameter must be omitted, which makes the request not valid &sru;.
+ parameter must be omitted, which makes the request not valid &acro.sru;.
Please feel free to use this facility within your own
- applications; but be aware that it is not only non-standard &sru;
+ applications; but be aware that it is not only non-standard &acro.sru;
but not even syntactically valid, since it omits the mandatory
<literal>query</literal> parameter.
</para>
</refsect2>
<refsect2 id="zebrasrv-sru-scan">
- <title>&sru; Scan</title>
+ <title>&acro.sru; Scan</title>
<para>
- Zebra supports <ulink url="&url.sru.scan;">&sru; scan</ulink>
+ Zebra supports <ulink url="&url.sru.scan;">&acro.sru; scan</ulink>
operation.
- Scanning using &cql; syntax is the default, where the
+ Scanning using &acro.cql; syntax is the default, where the
standard <literal>scanClause</literal> parameter is used.
</para>
<para>
In addition, a
- mutant form of &sru; scan is supported, using
+ mutant form of &acro.sru; scan is supported, using
the non-standard <literal>x-pScanClause</literal> parameter in
place of the standard <literal>scanClause</literal> to scan on a
- &pqf; query clause.
+ &acro.pqf; query clause.
</para>
</refsect2>
<refsect2 id="zebrasrv-sru-explain">
- <title>&sru; Explain</title>
+ <title>&acro.sru; Explain</title>
<para>
- Zebra supports <ulink url="&url.sru.explain;">&sru; explain</ulink>.
+ Zebra supports <ulink url="&url.sru.explain;">&acro.sru; explain</ulink>.
</para>
<para>
The ZeeRex record explaining a database may be requested either
- with a fully fledged &sru; request (with
+ with a fully fledged &acro.sru; request (with
<literal>operation</literal>=<literal>explain</literal>
and version-number specified)
- or with a simple HTTP &get; at the server's basename.
+ or with a simple HTTP &acro.get; at the server's basename.
The ZeeRex record returned in response is the one embedded
in the &yaz; Frontend Server configuration file that is described in the
<xref linkend="gfs-config"/>.
</para>
<para>
Unfortunately, the data found in the
- &cql;-to-&pqf; text file must be added by hand-craft into the explain
+ &acro.cql;-to-&acro.pqf; text file must be added by hand-craft into the explain
section of the &yaz; Frontend Server configuration file to be able
to provide a suitable explain record.
Too bad, but this is all extreme
new alpha stuff, and a lot of work has yet to be done ..
</para>
<para>
- There is no linkeage whatsoever between the &z3950; explain model
- and the &sru; explain response (well, at least not implemented
+ There is no linkeage whatsoever between the &acro.z3950; explain model
+ and the &acro.sru; explain response (well, at least not implemented
in Zebra, that is ..). Zebra does not provide a means using
- &z3950; to obtain the ZeeRex record.
+ &acro.z3950; to obtain the ZeeRex record.
</para>
</refsect2>
<refsect2 id="zebrasrv-non-sru-ops">
- <title>Other &sru; operations</title>
+ <title>Other &acro.sru; operations</title>
<para>
- In the &z3950; protocol, Initialization, Present, Sort and Close
- are separate operations. In &sru;, however, these operations do not
+ In the &acro.z3950; protocol, Initialization, Present, Sort and Close
+ are separate operations. In &acro.sru;, however, these operations do not
exist.
</para>
<itemizedlist>
<listitem>
<para>
- &sru; has no explicit initialization handshake phase, but
+ &acro.sru; has no explicit initialization handshake phase, but
commences immediately with searching, scanning and explain
operations.
</para>
</listitem>
<listitem>
<para>
- Neither does &sru; have a close operation, since the protocol is
+ Neither does &acro.sru; have a close operation, since the protocol is
stateless and each request is self-contained. (It is true that
- multiple &sru; request/response pairs may be implemented as
+ multiple &acro.sru; request/response pairs may be implemented as
multiple HTTP request/response pairs over a single persistent
TCP/IP connection; but the closure of that connection is not a
protocol-level operation.)
</listitem>
<listitem>
<para>
- Retrieval in &sru; is part of the
+ Retrieval in &acro.sru; is part of the
<literal>searchRetrieve</literal> operation, in which a search
is submitted and the response includes a subset of the records
- in the result set. There is no direct analogue of &z3950;'s
+ in the result set. There is no direct analogue of &acro.z3950;'s
Present operation which requests records from an established
- result set. In &sru;, this is achieved by sending a subsequent
+ result set. In &acro.sru;, this is achieved by sending a subsequent
<literal>searchRetrieve</literal> request with the query
<literal>cql.resultSetId=</literal><emphasis>id</emphasis> where
<emphasis>id</emphasis> is the identifier of the previously
</listitem>
<listitem>
<para>
- Sorting in &cql; is done within the
+ Sorting in &acro.cql; is done within the
<literal>searchRetrieve</literal> operation - in v1.1, by an
explicit <literal>sort</literal> parameter, but the forthcoming
v1.2 or v2.0 will most likely use an extension of the query
- language, <ulink url="&url.cql.sorting;">&cql; sorting</ulink>.
+ language, <ulink url="&url.cql.sorting;">&acro.cql; sorting</ulink>.
</para>
</listitem>
</itemizedlist>
<para>
- It can be seen, then, that while Zebra operating as an &sru; server
+ It can be seen, then, that while Zebra operating as an &acro.sru; server
does not provide the same set of operations as when operating as a
- &z3950; server, it does provide equivalent functionality.
+ &acro.z3950; server, it does provide equivalent functionality.
</para>
</refsect2>
</refsect1>
<refsect1 id="zebrasrv-sru-examples">
- <title>&sru; Examples</title>
+ <title>&acro.sru; Examples</title>
<para>
Surf into <literal>http://localhost:9999</literal>
to get an explain response, or use
]]></screen>
</para>
<para>
- Even search using &pqf; queries using the <emphasis>extended naughty
+ Even search using &acro.pqf; queries using the <emphasis>extended naughty
parameter</emphasis> <literal>x-pquery</literal>
<screen><![CDATA[
http://localhost:9999/?version=1.1&operation=searchRetrieve