1 <!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook V4.1//EN"
2 "http://www.oasis-open.org/docbook/xml/4.1/docbookx.dtd"
4 <!ENTITY % local SYSTEM "local.ent">
6 <!ENTITY % entities SYSTEM "entities.ent">
8 <!ENTITY % idcommon SYSTEM "common/common.ent">
11 <!-- $Id: zebrasrv.xml,v 1.6 2007-05-24 13:44:09 adam Exp $ -->
12 <refentry id="zebrasrv">
14 <productname>zebra</productname>
15 <productnumber>&version;</productnumber>
19 <refentrytitle>zebrasrv</refentrytitle>
20 <manvolnum>8</manvolnum>
24 <refname>zebrasrv</refname>
25 <refpurpose>Zebra Server</refpurpose>
31 <refsect1><title>DESCRIPTION</title>
32 <para>Zebra is a high-performance, general-purpose structured text indexing
33 and retrieval engine. It reads structured records in a variety of input
34 formats (eg. email, &acro.xml;, &acro.marc;) and allows access to them through exact
35 boolean search expressions and relevance-ranked free-text queries.
38 <command>zebrasrv</command> is the &acro.z3950; and &acro.sru; frontend
39 server for the <command>Zebra</command> search engine and indexer.
42 On Unix you can run the <command>zebrasrv</command>
43 server from the command line - and put it
44 in the background. It may also operate under the inet daemon.
45 On WIN32 you can run the server as a console application or
50 <title>OPTIONS</title>
53 The options for <command>zebrasrv</command> are the same
54 as those for &yaz;' <command>yaz-ztest</command>.
55 Option <literal>-c</literal> specifies a Zebra configuration
56 file - if omitted <filename>zebra.cfg</filename> is read.
62 <refsect1 id="protocol-support">
63 <title>&acro.z3950; Protocol Support and Behavior</title>
65 <refsect2 id="zebrasrv-initialization">
66 <title>&acro.z3950; Initialization</title>
69 During initialization, the server will negotiate to version 3 of the
70 &acro.z3950; protocol, and the option bits for Search, Present, Scan,
71 NamedResultSets, and concurrentOperations will be set, if requested by
72 the client. The maximum PDU size is negotiated down to a maximum of
78 <refsect2 id="zebrasrv-search">
79 <title>&acro.z3950; Search</title>
82 The supported query type are 1 and 101. All operators are currently
83 supported with the restriction that only proximity units of type "word"
84 are supported for the proximity operator.
85 Queries can be arbitrarily complex.
86 Named result sets are supported, and result sets can be used as operands
88 Searches may span multiple databases.
92 The server has full support for piggy-backed retrieval (see
93 also the following section).
98 <refsect2 id="zebrasrv-present">
99 <title>&acro.z3950; Present</title>
101 The present facility is supported in a standard fashion. The requested
102 record syntax is matched against the ones supported by the profile of
103 each record retrieved. If no record syntax is given, &acro.sutrs; is the
104 default. The requested element set name, again, is matched against any
105 provided by the relevant record profiles.
108 <refsect2 id="zebrasrv-scan">
109 <title>&acro.z3950; Scan</title>
111 The attribute combinations provided with the termListAndStartPoint are
112 processed in the same way as operands in a query (see above).
113 Currently, only the term and the globalOccurrences are returned with
114 the termInfo structure.
117 <refsect2 id="zebrasrv-sort">
118 <title>&acro.z3950; Sort</title>
121 &acro.z3950; specifies three different types of sort criteria.
122 Of these Zebra supports the attribute specification type in which
123 case the use attribute specifies the "Sort register".
124 Sort registers are created for those fields that are of type "sort" in
125 the default.idx file.
126 The corresponding character mapping file in default.idx specifies the
127 ordinal of each character used in the actual sort.
131 &acro.z3950; allows the client to specify sorting on one or more input
132 result sets and one output result set.
133 Zebra supports sorting on one result set only which may or may not
134 be the same as the output result set.
137 <refsect2 id="zebrasrv-close">
138 <title>&acro.z3950; Close</title>
140 If a Close PDU is received, the server will respond with a Close PDU
141 with reason=FINISHED, no matter which protocol version was negotiated
142 during initialization. If the protocol version is 3 or more, the
143 server will generate a Close PDU under certain circumstances,
144 including a session timeout (60 minutes by default), and certain kinds of
145 protocol errors. Once a Close PDU has been sent, the protocol
146 association is considered broken, and the transport connection will be
147 closed immediately upon receipt of further data, or following a short
152 <refsect2 id="zebrasrv-explain">
153 <title>&acro.z3950; Explain</title>
155 Zebra maintains a "classic"
156 <ulink url="&url.z39.50.explain;">&acro.z3950; Explain</ulink> database
158 This database is called <literal>IR-Explain-1</literal> and can be
159 searched using the attribute set <literal>exp-1</literal>.
162 The records in the explain database are of type
163 <literal>grs.sgml</literal>.
164 The root element for the Explain grs.sgml records is
165 <literal>explain</literal>, thus
166 <filename>explain.abs</filename> is used for indexing.
170 Zebra <emphasis>must</emphasis> be able to locate
171 <filename>explain.abs</filename> in order to index the Explain
172 records properly. Zebra will work without it but the information
173 will not be searchable.
178 <refsect1 id="zebrasrv-sru">
179 <title>The &acro.sru; Server</title>
181 In addition to &acro.z3950;, Zebra supports the more recent and
182 web-friendly IR protocol <ulink url="&url.sru;">&acro.sru;</ulink>.
183 &acro.sru; can be carried over &acro.soap; or a &acro.rest;-like protocol
184 that uses HTTP &acro.get; or &acro.post; to request search responses. The request
185 itself is made of parameters such as
186 <literal>query</literal>,
187 <literal>startRecord</literal>,
188 <literal>maximumRecords</literal>
190 <literal>recordSchema</literal>;
191 the response is an &acro.xml; document containing hit-count, result-set
192 records, diagnostics, etc. &acro.sru; can be thought of as a re-casting
193 of &acro.z3950; semantics in web-friendly terms; or as a standardisation
194 of the ad-hoc query parameters used by search engines such as Google
195 and AltaVista; or as a superset of A9's OpenSearch (which it
199 Zebra supports &acro.z3950;, &acro.sru; &acro.get;, SRU &acro.post;, SRU &acro.soap; (&acro.srw;)
200 - on the same port, recognising what protocol is used by each incoming
201 requests and handling them accordingly. This is a achieved through
202 the use of Deep Magic; civilians are warned not to stand too close.
204 <refsect2 id="zebrasrv-sru-run">
205 <title>Running zebrasrv as an &acro.sru; Server</title>
207 Because Zebra supports all protocols on one port, it would
208 seem to follow that the &acro.sru; server is run in the same way as
209 the &acro.z3950; server, as described above. This is true, but only in
210 an uninterestingly vacuous way: a Zebra server run in this manner
211 will indeed recognise and accept &acro.sru; requests; but since it
212 doesn't know how to handle the &acro.cql; queries that these protocols
213 use, all it can do is send failure responses.
217 It is possible to cheat, by having &acro.sru; search Zebra with
218 a &acro.pqf; query instead of &acro.cql;, using the
219 <literal>x-pquery</literal>
221 <literal>query</literal>.
223 <emphasis role="strong">non-standard extension</emphasis>
225 <emphasis role="strong">very naughty</emphasis>
226 thing to do, but it does give you a way to see Zebra serving &acro.sru;
227 ``right out of the box''. If you start your favourite Zebra
228 server in the usual way, on port 9999, then you can send your web
232 http://localhost:9999/Default?version=1.1
233 &operation=searchRetrieve
234 &x-pquery=mineral
236 &maximumRecords=1
239 This will display the &acro.xml;-formatted &acro.sru; response that includes the
240 first record in the result-set found by the query
241 <literal>mineral</literal>. (For clarity, the &acro.sru; URL is shown
242 here broken across lines, but the lines should be joined to gether
243 to make single-line URL for the browser to submit.)
247 In order to turn on Zebra's support for &acro.cql; queries, it's necessary
248 to have the &yaz; generic front-end (which Zebra uses) translate them
249 into the &acro.z3950; Type-1 query format that is used internally. And
250 to do this, the generic front-end's own configuration file must be
251 used. See <xref linkend="gfs-config"/>;
252 the salient point for &acro.sru; support is that
253 <command>zebrasrv</command>
254 must be started with the
255 <literal>-f frontendConfigFile</literal>
256 option rather than the
257 <literal>-c zebraConfigFile</literal>
259 and that the front-end configuration file must include both a
260 reference to the Zebra configuration file and the &acro.cql;-to-&acro.pqf;
261 translator configuration file.
264 A minimal front-end configuration file that does this would read as
271 <config>zebra.cfg</config>
272 <cql2rpn>../../tab/pqf.properties</cql2rpn>
278 <literal><config></literal>
279 element contains the name of the Zebra configuration file that was
280 previously specified by the
281 <literal>-c</literal>
282 command-line argument, and the
283 <literal><cql2rpn></literal>
284 element contains the name of the &acro.cql; properties file specifying how
285 various &acro.cql; indexes, relations, etc. are translated into Type-1
289 A zebra server running with such a configuration can then be
290 queried using proper, conformant &acro.sru; URLs with &acro.cql; queries:
293 http://localhost:9999/Default?version=1.1
294 &operation=searchRetrieve
295 &query=title=utah and description=epicent*
297 &maximumRecords=1
301 <refsect1 id="zebrasrv-sru-support">
302 <title>&acro.sru; Protocol Support and Behavior</title>
304 Zebra running as an &acro.sru; server supports SRU version 1.1, including
305 &acro.cql; version 1.1. In particular, it provides support for the
306 following elements of the protocol.
309 <refsect2 id="zebrasrvr-search-and-retrieval">
310 <title>&acro.sru; Search and Retrieval</title>
313 <ulink url="&url.sru.searchretrieve;">&acro.sru; searchRetrieve</ulink>
317 One of the great strengths of &acro.sru; is that it mandates a standard
318 query language, &acro.cql;, and that all conforming implementations can
319 therefore be trusted to correctly interpret the same queries. It
320 is with some shame, then, that we admit that Zebra also supports
321 an additional query language, our own Prefix Query Format
322 (<ulink url="&url.yaz.pqf;">&acro.pqf;</ulink>).
323 A &acro.pqf; query is submitted by using the extension parameter
324 <literal>x-pquery</literal>,
326 <literal>query</literal>
327 parameter must be omitted, which makes the request not valid &acro.sru;.
328 Please feel free to use this facility within your own
329 applications; but be aware that it is not only non-standard &acro.sru;
330 but not even syntactically valid, since it omits the mandatory
331 <literal>query</literal> parameter.
335 <refsect2 id="zebrasrv-sru-scan">
336 <title>&acro.sru; Scan</title>
338 Zebra supports <ulink url="&url.sru.scan;">&acro.sru; scan</ulink>
340 Scanning using &acro.cql; syntax is the default, where the
341 standard <literal>scanClause</literal> parameter is used.
345 mutant form of &acro.sru; scan is supported, using
346 the non-standard <literal>x-pScanClause</literal> parameter in
347 place of the standard <literal>scanClause</literal> to scan on a
348 &acro.pqf; query clause.
352 <refsect2 id="zebrasrv-sru-explain">
353 <title>&acro.sru; Explain</title>
355 Zebra supports <ulink url="&url.sru.explain;">&acro.sru; explain</ulink>.
358 The ZeeRex record explaining a database may be requested either
359 with a fully fledged &acro.sru; request (with
360 <literal>operation</literal>=<literal>explain</literal>
361 and version-number specified)
362 or with a simple HTTP &acro.get; at the server's basename.
363 The ZeeRex record returned in response is the one embedded
364 in the &yaz; Frontend Server configuration file that is described in the
365 <xref linkend="gfs-config"/>.
368 Unfortunately, the data found in the
369 &acro.cql;-to-&acro.pqf; text file must be added by hand-craft into the explain
370 section of the &yaz; Frontend Server configuration file to be able
371 to provide a suitable explain record.
372 Too bad, but this is all extreme
373 new alpha stuff, and a lot of work has yet to be done ..
376 There is no linkeage whatsoever between the &acro.z3950; explain model
377 and the &acro.sru; explain response (well, at least not implemented
378 in Zebra, that is ..). Zebra does not provide a means using
379 &acro.z3950; to obtain the ZeeRex record.
383 <refsect2 id="zebrasrv-non-sru-ops">
384 <title>Other &acro.sru; operations</title>
386 In the &acro.z3950; protocol, Initialization, Present, Sort and Close
387 are separate operations. In &acro.sru;, however, these operations do not
393 &acro.sru; has no explicit initialization handshake phase, but
394 commences immediately with searching, scanning and explain
400 Neither does &acro.sru; have a close operation, since the protocol is
401 stateless and each request is self-contained. (It is true that
402 multiple &acro.sru; request/response pairs may be implemented as
403 multiple HTTP request/response pairs over a single persistent
404 TCP/IP connection; but the closure of that connection is not a
405 protocol-level operation.)
410 Retrieval in &acro.sru; is part of the
411 <literal>searchRetrieve</literal> operation, in which a search
412 is submitted and the response includes a subset of the records
413 in the result set. There is no direct analogue of &acro.z3950;'s
414 Present operation which requests records from an established
415 result set. In &acro.sru;, this is achieved by sending a subsequent
416 <literal>searchRetrieve</literal> request with the query
417 <literal>cql.resultSetId=</literal><emphasis>id</emphasis> where
418 <emphasis>id</emphasis> is the identifier of the previously
419 generated result-set.
424 Sorting in &acro.cql; is done within the
425 <literal>searchRetrieve</literal> operation - in v1.1, by an
426 explicit <literal>sort</literal> parameter, but the forthcoming
427 v1.2 or v2.0 will most likely use an extension of the query
428 language, <ulink url="&url.cql.sorting;">&acro.cql; sorting</ulink>.
433 It can be seen, then, that while Zebra operating as an &acro.sru; server
434 does not provide the same set of operations as when operating as a
435 &acro.z3950; server, it does provide equivalent functionality.
440 <refsect1 id="zebrasrv-sru-examples">
441 <title>&acro.sru; Examples</title>
443 Surf into <literal>http://localhost:9999</literal>
444 to get an explain response, or use
446 http://localhost:9999/?version=1.1&operation=explain
450 See number of hits for a query
452 http://localhost:9999/?version=1.1&operation=searchRetrieve
453 &query=text=(plant%20and%20soil)
457 Fetch record 5-7 in Dublin Core format
459 http://localhost:9999/?version=1.1&operation=searchRetrieve
460 &query=text=(plant%20and%20soil)
461 &startRecord=5&maximumRecords=2&recordSchema=dc
465 Even search using &acro.pqf; queries using the <emphasis>extended naughty
466 parameter</emphasis> <literal>x-pquery</literal>
468 http://localhost:9999/?version=1.1&operation=searchRetrieve
469 &x-pquery=@attr%201=text%20@and%20plant%20soil
473 Or scan indexes using the <emphasis>extended extremely naughty
474 parameter</emphasis> <literal>x-pScanClause</literal>
476 http://localhost:9999/?version=1.1&operation=scan
477 &x-pScanClause=@attr%201=text%20something
479 <emphasis>Don't do this in production code!</emphasis>
480 But it's a great fast debugging aid.
485 <refsect1 id="gfs-config"><title>&yaz; server virtual hosts</title>
489 <refsect1><title>SEE ALSO</title>
492 <refentrytitle>zebraidx</refentrytitle>
493 <manvolnum>1</manvolnum>
499 <!-- Keep this comment at the end of the file
504 sgml-minimize-attributes:nil
505 sgml-always-quote-attributes:t
508 sgml-parent-document: "zebra.xml"
509 sgml-local-catalogs: nil
510 sgml-namecase-general:t