2 <title>The YAZ Proxy</title>
4 The YAZ proxy is a transparent SRW/SRU/Z39.50-to-Z39.50 gateway.
5 That is, it is a SRW/SRU/Z39.50 server which has as its back-end a
6 Z39.50 client that forwards requests on to another server (known as
7 the <firstterm>backend target</firstterm>.)
10 -- All config directives --
13 -- Mention XSLT conversion
16 The YAZ Proxy is useful for debugging SRW/SRU/Z39.50 software, logging
17 APDUs, redirecting Z39.50 packages through firewalls, etc.
18 Furthermore, it offers facilities that often
19 boost performance for connectionless Z39.50 clients such
23 Unlike most other server software, the proxy runs single-threaded,
24 single-process. Every I/O operation
25 is non-blocking so it is very lightweight and extremely fast.
26 It does not store any state information on the hard drive,
27 except any log files you ask for.
30 <section id="proxy-example">
31 <title>Example: Using the Proxy to Log APDUs</title>
33 Suppose you use a commercial Z39.50 client for which you do not
34 have source code, and it's not behaving how you think it should
35 when running against some specific server that you have no control
36 over. One way to diagnose the problem is to find out what packets
37 (APDUs) are being sent and received, but not all client
38 applications have facilities to do APDU logging.
41 No problem. Run the proxy on a friendly machine, get it to log
42 APDUs, and point the errant client at the proxy instead of
43 directly at the server that's causing it problems.
46 Suppose the server is running on <literal>foo.bar.com</literal>,
47 port 18398. Run the proxy on the machine of your choice, say
48 <literal>your.company.com</literal> like this:
51 yazproxy -a - -t tcp:foo.bar.com:18398 tcp:@:9000
54 (The <literal>-a -</literal> option requests APDU logging on
55 standard output, <literal>-t tcp:foo.bar.com:18398</literal>
56 specifies where the backend target is, and
57 <literal>tcp:@:9000</literal> tells the proxy to listen on port
58 9000 and accept connections from any machine.)
61 Now change your client application's configuration so that instead
62 of connecting to <literal>foo.bar.com</literal> port 18398, it
63 connects to <literal>your.company.com</literal> port 9000, and
64 start it up. It will work exactly as usual, but all the packets
65 will be sent via the proxy, which will generate a log like this:
70 referenceId OCTETSTRING(len=4) 69 6E 69 74
71 protocolVersion BITSTRING(len=1)
72 options BITSTRING(len=2)
73 preferredMessageSize 1048576
74 maximumRecordSize 1048576
75 implementationId 'Mike Taylor (id=169)'
76 implementationName 'Net::Z3950.pm (Perl)'
77 implementationVersion '0.31'
81 referenceId OCTETSTRING(len=4) 69 6E 69 74
82 protocolVersion BITSTRING(len=1)
83 options BITSTRING(len=2)
84 preferredMessageSize 1048576
85 maximumRecordSize 1048576
88 implementationName 'GFS/YAZ / Zebra Information Server'
89 implementationVersion 'YAZ 1.9.1 / Zebra 1.3.3'
93 referenceId OCTETSTRING(len=1) 30
96 mediumSetPresentNumber 0
98 resultSetName 'default'
103 smallSetElementSetNames choice
107 mediumSetElementSetNames choice
110 preferredRecordSyntax OID: 1 2 840 10003 5 10
114 attributeSetId OID: 1 2 840 10003 3 1
122 general OCTETSTRING(len=7) 6D 69 6E 65 72 61 6C
132 <section id="proxy-target">
133 <title>Specifying the Backend Target</title>
135 When the proxy receives a Z39.50 Initialize Request from a Z39.50
136 client, it determines the backend target by the following rules:
139 <para>If the <literal>InitializeRequest</literal> PDU from the
141 <link linkend="otherinfo-encoding"><literal>otherInfo</literal></link>
143 <literal>1.2.840.10003.10.1000.81.1</literal>, then the
144 contents of that element specify the target to be used, in the
145 usual YAZ address format (typically
146 <literal>tcp:<parameter>hostname</parameter>:<parameter>port</parameter></literal>)
148 <ulink url="http://www.indexdata.dk/yaz/doc/comstack.addresses.tkl"
149 >the Addresses section of the YAZ manual</ulink>.
153 <para>Otherwise, the Proxy uses the default target, if one was
154 specified on the command-line with the <literal>-t</literal>
155 option. A default target can also be specified in the
160 <para>Otherwise, the proxy closes the connection with
167 <section id="proxy-keepalive">
168 <title>Keep-alive Facility</title>
170 The keep-alive is a facility where the proxy keeps the connection to the
171 backend - even if the client closes the connection to the proxy.
174 If a new or another client connects to the proxy again and requests the
175 same backend it will be reassigned to this backend. In this case, the
176 proxy sends an initialize response directly to the client and an
177 initialize handshake with the backend is omitted.
180 When a client reconnects, query and record caching works better, if the
181 proxy assigns it to the same backend as before. And the result set
182 (if any) is re-used. To achieve this, Index Data defined a session
183 cookie which identifies the backend session.
186 The cookie is defined by the client and is sent as part of the
187 Initialize Request and passed in an
188 <link linkend="otherinfo-encoding"><literal>otherInfo</literal></link>
189 element with OID <literal>1.2.840.10003.10.1000.81.2</literal>.
192 Clients that do not send a cookie as part of the initialize request
193 may still better performance, since the init handshake is saved.
197 <section id="query-cache">
198 <title>Query Caching</title>
200 Simple stateless clients often send identical Z39.50 searches
201 in a relatively short period of time (e.g. in order to produce a
202 results-list page, the next page,
203 a single full-record, etc). And for many targets, it's
204 much more expensive to produce a new result set than to
205 reuse an existing one.
208 The proxy tries to solve that by remembering the last query for each
209 backend target, so that if an identical query is received next, it
210 is turned into Present Requests rather than new Search Requests.
214 In a future we release will will probably allows for
215 an arbitrary-sized cache for targets supporting named result sets.
219 You can enable/disable query caching using option -o.
223 <section id="record-cache">
224 <title>Record Caching</title>
226 As an option, the proxy may also cache result set records for the
228 The proxy takes into account the Record Syntax and CompSpec.
229 The CompSpec includes simple element set names as well.
230 By default the cache is 200000 bytes per session.
234 <section id="query-validation">
235 <title>Query Validation</title>
237 The Proxy may also be configured to trap particular attributes in
238 Type-1 queries and send Bib-1 diagnostics back to the client without
239 even consulting the backend target. This facility may be useful if
240 a target does not properly issue diagnostics when unsupported attributes
245 <section id="record-validation">
246 <title>Record Syntax Validation</title>
248 The proxy may be configured to accept, reject or convert records.
249 When accepted, the target passes search/present requests to the
250 backend target under the assumption that the target can honor the
251 request (In fact it may not do that). When a record is rejected because
252 the record syntax is "unsupported" the proxy returns a diagnostic to the
253 client. Finally, the proxy may convert records.
256 The proxy can convert from MARC to MARCXML and thereby offer an
257 XML version of any MARC record as long as it is ISO2709 encoded.
258 If the proxy is compiled with libXSLT support it can also
263 <section id="other-optimizations">
264 <title>Other Optimizations</title>
266 We've had some plans to support global caching of result set records,
267 but this has not yet been implemented.
271 <section id="proxy-config-file">
272 <title>Proxy Configuration File</title>
274 The Proxy may read a configuration file using option
275 <literal>-c</literal> followed by the filename of a config file.
278 The config file is XML based. The YAZ proxy must be compiled
279 with <ulink url="http://www.xmlsoft.org/">libxml2</ulink> and
280 <ulink url="http://xmlsoft.org/XSLT/">libXSLT</ulink> support in
281 order for the config file facility to be enabled.
284 <para>To check for a config file to be well-formed, the yazproxy may
285 be invoked without specifying a listening port, i.e.
287 yazproxy -c myconfig.xml
289 If this does not produce errors, the file is well-formed.
292 <section id="proxy-config-header">
293 <title>Proxy Configuration Header</title>
295 The proxy config file must have a root element called
296 <literal>proxy</literal>. All information except an optional XML
297 header must be stored within the <literal>proxy</literal> element.
300 <?xml version="1.0"?>
302 <!-- content here .. -->
306 <section id="proxy-config-target">
307 <title>Configuration: target</title>
309 The element <literal>target</literal> which may be repeated zero
310 or more times with parent element <literal>proxy</literal> contains
311 information about each backend target.
312 The <literal>target</literal> element have two attributes:
313 <literal>name</literal> which holds the logical name of the backend
314 target (required) and <literal>default</literal> (optional) which
315 (when given) specifies that the backend target is the default target -
316 equivalent to command line option <literal>-t</literal>.
320 <?xml version="1.0"?>
322 <target name="server1" default="1">
323 <!-- description of server1 .. -->
325 <target name="server2">
326 <!-- description of server2 .. -->
332 <section id="proxy-config-url">
333 <title>Configuration:url</title>
335 The <literal>url</literal> which may be repeated one or more times
336 should be the child of the <literal>target</literal> element.
337 The CDATA of <literal>url</literal> is the Z-URL of the backend.
340 Multiple <literal>url</literal> element may be used. In that case, then
341 a client initiates a session, the proxy chooses the URL with the lowest
342 number of active sessions, thereby distributing the load. It is
343 assumed that each URL represents the same database (data).
346 <section id="proxy-config-keepalive">
347 <title>Configuration: keepalive</title>
348 <para>The <literal>keepalive</literal> element holds information about
349 the keepalive Z39.50 sessions. Keepalive sessions are proxy-to-backend
350 sessions that is no longer associated with a client session.
352 <para>The <literal>keepalive</literal> element which is the child of
353 the <literal>target</literal>holds two elements:
354 <literal>bandwidth</literal> and <literal>pdu</literal>.
355 The <literal>bandwidth</literal> is the maximum total bytes
356 transferred to/from the target. If a target session exceeds this
357 limit, it is shut down (and no longer kept alive).
358 The <literal>pdu</literal> is the maximum number of requests sent
359 to the target. If a target session exceeds this limit, it is
360 shut down. The idea of these two limits is that avoid very long
361 sessions that use resources in a backend (that leaks!).
364 The following sets maximum number of bytes transferred in a
365 target session to 1 MB and maxinum of requests to 400.
368 <bandwidth>1048576</bandwidth>
369 <retrieve>400</retrieve>
374 <section id="proxy-config-limit">
375 <title>Configuration: limit</title>
377 The <literal>limit</literal> section specifies bandwidth/pdu requests
378 limits for an active session.
379 The proxy records bandwidth/pdu requests during the last 60 seconds
380 (1 minute). The <literal>limit</literal> may include the
381 elements <literal>bandwidth</literal>, <literal>pdu</literal>,
382 and <literal>retrieve</literal>. The <literal>bandwidth</literal>
383 measures the number of bytes transferred within the last minute.
384 The <literal>pdu</literal> is the number of requests in the last
385 minute. The <literal>retrieve</literal> holds the maximum records to
386 be retrieved in one Present Request.
389 If a bandwidth/pdu limit is reached the proxy will postpone the
390 requests to the target and wait one or more seconds. The idea of the
391 limit is to ensure that clients that downloads hundreds or thousands of
392 records do not hurt other users.
395 The following sets maximum number of bytes transferred per minute to
396 500Kbytes and maximum number of requests to 40.
399 <bandwidth>524288</bandwidth>
400 <retrieve>40</retrieve>
406 Typically the limits for keepalive are much higher than
407 those for session minute average.
412 <section id="proxy-config-attribute">
413 <title>Configuration: attribute</title>
415 The <literal>attribute</literal> element specifies accept or reject
416 or a particular attribute type, value pair.
417 Well-behaving targets will reject unsupported attributes on their
418 own. This feature is useful for targets that do not gracefully
419 handle unsupported attributes.
422 Attribute elements may be repeated. The proxy inspects the attribute
423 specifications in the order as specified in the configuration file.
424 When a given attribute specification matches a given attribute list
425 in a query, the proxy takes appropriate action (reject, accept).
428 If no attribute specifications matches the attribute list in a query,
432 The <literal>attribute</literal> element has two required attributes:
433 <literal>type</literal> which is the Attribute Type-1 type, and
434 <literal>value</literal> which is the Attribute Type-1 value.
435 The special value/type <literal>*</literal> matches any attribute
436 type/value. A value may also be specified as a list with each
437 value separated by comma, a value may also be specified as a
438 list: low value - dash - high value.
441 If attribute <literal>error</literal> is given, that holds a
442 Bib-1 diagnostic which is sent to the client if the particular
443 type, value is part of a query.
446 If attribute <literal>error</literal> is not given, the attribute
447 type, value is accepted and passed to the backend target.
450 A target that supports use attributes 1,4, 1000 through 1003 and
451 no other use attributes, could use the following rules:
453 <attribute type="1" value="1,4,1000-1003">
454 <attribute type="1" value="*" error="114"/>
459 <section id="proxy-config-syntax">
460 <title>Configuration: syntax</title>
462 The <literal>syntax</literal> element specifies accept or reject
463 or a particular record syntax request from the client.
466 The <literal>syntax</literal> has one required attribute:
467 <literal>type</literal> which is the Preferred Record Syntax.
470 If attribute <literal>error</literal> is given, that holds a
471 Bib-1 diagnostic which is sent to the client if the particular
472 record syntax is part of a present - or search request.
475 If attribute <literal>error</literal> is not given, the record syntax
476 is accepted and passed to the backend target.
479 If attribute <literal>marcxml</literal> is given, the proxy will
480 perform MARC21 to MARCXML conversion. In this case the
481 <literal>type</literal> should be XML. The proxy will use
482 preferred record syntax USMARC/MARC21 against the backend target.
484 <para>To accept USMARC and offer MARCXML XML records but reject
485 all other requests the following configuration could be used:
488 <target name="mytarget">
489 <syntax type="usmarc"/>
490 <syntax type="xml" marcxml="1"/>
491 <syntax type="*" error="238"/>
498 <section id="proxy-config-target-timeout">
499 <title>Configuration: target-timeout</title>
501 The element <literal>target-timeout</literal> is the child of element
502 <literal>target</literal> and specifies the amount in seconds before
503 a target session is shut down.
506 This can also be specified on the command line by using option
507 <literal>-T</literal>. Refer to <xref linkend="proxy-usage"/>.
511 <section id="proxy-config-client-timeout">
512 <title>Configuration: client-timeout</title>
514 The element <literal>client-timeout</literal> is the child of element
515 <literal>target</literal> and specifies the amount in seconds before
516 a client session is shut down.
519 This can also be specified on the command line by using option
520 <literal>-i</literal>. Refer to <xref linkend="proxy-usage"/>.
524 <section id="proxy-config-preinit">
525 <title>Configuration: preinit</title>
527 The element <literal>preinit</literal> is the child of element
528 <literal>target</literal> and specifies the number of spare
529 connection to a target. By default no spare connection are
530 created by the proxy. If the proxy uses a target exclusive or
531 a lot, the preinit session will ensure that target sessions
532 have been made before the client makes a connection and will therefore
533 reduce the connect-init handshake dramatically. Never set this to
538 <section id="proxy-config-max-clients">
539 <title>Configuration: max-clients</title>
541 The element <literal>max-clients</literal> is the child of element
542 <literal>proxy</literal> and specifies the total number of
543 allowed connections to targets (all targets). If this limit
544 is reached the proxy will close the least recently used connection.
547 Note, that many Unix systems impose a system on the number of
548 open files allowed in a single process, typically in the
549 range 256 (Solaris) to 1024 (Linux).
550 The proxy uses 2 sockets per session + a few files
551 for logging. As a rule of thumb, ensure that 2*max-clients + 5
552 can be opened by the proxy process.
556 Using the <ulink url="http://www.gnu.org/software/bash/bash.html">
557 bash</ulink> shell, you can set the limit with
558 <literal>ulimit -n</literal><replaceable>no</replaceable>.
559 Use <literal>ulimit -a</literal> to display limits.
564 <section id="proxy-config-log">
565 <title>Configuration: log</title>
567 The element <literal>log</literal> is the child of element
568 <literal>proxy</literal> and specifies what to be logged by the
572 Specify the log file with command-line option <literal>-l</literal>.
575 The text of the <literal>log</literal> element is a sequence of
576 options separated by white space. See the table below:
577 <table frame="top"><title>Logging options</title>
579 <colspec colwidth="1*" colname="option"/>
580 <colspec colwidth="2*" colname="description"/>
583 <entry>Option</entry>
584 <entry>Description</entry>
589 <entry><literal>client-apdu</literal></entry>
591 Log APDUs as reported by YAZ for the
592 communication between the client and the proxy.
593 This facility is equivalent to the APDU logging that
594 happens when using option <literal>-a</literal>, however
595 this tells the proxy to log in the same file as given
596 by <literal>-l</literal>.
600 <entry><literal>server-apdu</literal></entry>
602 Log APDUs as reported by YAZ for the
603 communication between the proxy and the server (backend).
607 <entry><literal>clients-requests</literal></entry>
609 Log a brief description about requests transferred between
610 the client and the proxy. The name of the request and the size
611 of the APDU is logged.
615 <entry><literal>server-requests</literal></entry>
617 Log a brief description about requests transferred between
618 the proxy and the server (backend). The name of the request
619 and the size of the APDU is logged.
627 To log communication in details between the proxy and the backend, th
628 following configuration could be used:
630 <target name="mytarget">
631 <log>server-apdu server-requests</log>
639 <section id="proxy-usage">
640 <title>Proxy Usage</title>
643 <refentry id="yazproxy-ref">
647 <section id="otherinfo-encoding"><title>OtherInformation Encoding</title>
649 The proxy uses the OtherInformation definition to carry
650 information about the target address and cookie.
653 OtherInformation ::= [201] IMPLICIT SEQUENCE OF SEQUENCE{
654 category [1] IMPLICIT InfoCategory OPTIONAL,
656 characterInfo [2] IMPLICIT InternationalString,
657 binaryInfo [3] IMPLICIT OCTET STRING,
658 externallyDefinedInfo [4] IMPLICIT EXTERNAL,
659 oid [5] IMPLICIT OBJECT IDENTIFIER}}
661 InfoCategory ::= SEQUENCE{
662 categoryTypeId [1] IMPLICIT OBJECT IDENTIFIER OPTIONAL,
663 categoryValue [2] IMPLICIT INTEGER}
666 The <literal>categoryTypeId</literal> is either
667 OID 1.2.840.10003.10.1000.81.1, 1.2.840.10003.10.1000.81.2
668 for proxy target and proxy cookie respectively. The
669 integer element <literal>category</literal> is set to 0.
670 The value proxy and cookie is stored in element
671 <literal>characterInfo</literal> of the <literal>information</literal>
676 <!-- Keep this comment at the end of the file
681 sgml-minimize-attributes:nil
682 sgml-always-quote-attributes:t
685 sgml-parent-document: "yazproxy.xml"
686 sgml-local-catalogs: nil
687 sgml-namecase-general:t