X-Git-Url: http://sru.miketaylor.org.uk/?a=blobdiff_plain;f=doc%2Fproxy.xml;h=556245410a2c4366e16f7100a49b887549bba276;hb=916452d348a342be0b6bbc054d59bc8897fb2f79;hp=d0250cd467112b5797e35dccd394e118acc0ba62;hpb=2ad40f2e15a4d6927833231b8dc6874b747fed2e;p=yazpp-moved-to-github.git
diff --git a/doc/proxy.xml b/doc/proxy.xml
index d0250cd..5562454 100644
--- a/doc/proxy.xml
+++ b/doc/proxy.xml
@@ -1,10 +1,671 @@
-
-
- YAZ Proxy
-
- About YAZ Proxy.
-
-
+
+ The YAZ Proxy
+
+ The YAZ proxy is a transparent Z39.50-to-Z39.50 gateway. That is,
+ it is a Z39.50 server which has as its back-end a Z39.50 client
+ that forwards requests on to another server (known as the
+ backend target.)
+
+
+ The YAZ Proxy is useful for debugging Z39.50 software, logging
+ APDUs, redirecting Z39.50 packages through firewalls, etc.
+ Furthermore, it offers facilities that often
+ boost performance for connectionless Z39.50 clients such
+ as web gateways.
+
+
+ Unlike most other server software, the proxy runs single-threaded,
+ single-process. Every I/O operation
+ is non-blocking so it is very lightweight and extremely fast.
+ It does not store any state information on the hard drive,
+ except any log files you ask for.
+
+
+
+ Example: Using the Proxy to Log APDUs
+
+ Suppose you use a commercial Z39.50 client for which you do not
+ have source code, and it's not behaving how you think it should
+ when running against some specific server that you have no control
+ over. One way to diagnose the problem is to find out what packets
+ (APDUs) are being sent and received, but not all client
+ applications have facilities to do APDU logging.
+
+
+ No problem. Run the proxy on a friendly machine, get it to log
+ APDUs, and point the errant client at the proxy instead of
+ directly at the server that's causing it problems.
+
+
+ Suppose the server is running on foo.bar.com,
+ port 18398. Run the proxy on the machine of your choice, say
+ your.company.com like this:
+
+
+ yaz-proxy -a - -t tcp:foo.bar.com:18398 tcp:@:9000
+
+
+ (The -a - option requests APDU logging on
+ standard output, -t tcp:foo.bar.com:18398
+ specifies where the backend target is, and
+ tcp:@:9000 tells the proxy to listen on port
+ 9000 and accept connections from any machine.)
+
+
+ Now change your client application's configuration so that instead
+ of connecting to foo.bar.com port 18398, it
+ connects to your.company.com port 9000, and
+ start it up. It will work exactly as usual, but all the packets
+ will be sent via the proxy, which will generate a log like this:
+
+
+
+
+
+
+ Specifying the Backend Target
+
+ When the proxy accepts a Z39.50 client session, it
+ determines the backend target by the following rules:
+
+
+ If the InitializeRequest PDU from the
+ client includes an
+ otherInfo
+ element with OID
+ 1.2.840.10003.10.1000.81.1, then the
+ contents of that element specify the target to be used, in the
+ usual YAZ address format (typically
+ tcp:hostname:port)
+ as described in
+ the Addresses section of the YAZ manual.
+
+
+
+ Otherwise, the Proxy uses the default target, if one was
+ specified on the command-line with the -t
+ option. A default target can also be specified in the
+ XML Config file.
+
+
+
+ Otherwise, the proxy closes the connection with
+ the client.
+
+
+
+
+
+
+ Keep-alive Facility
+
+ The keep-alive is a facility where the proxy keeps the connection to the
+ backend - even if the client closes the connection to the proxy.
+
+
+ If a new or another client connects to the proxy again and requests the
+ same backend it will be reassigned to this backend. In this case, the
+ proxy sends an initialize response directly to the client and an
+ initialize handshake with the backend is omitted.
+
+
+ When a client reconnects, query and record caching works better, if the
+ proxy assigns it to the same backend as before. And the result set
+ (if any) is re-used. To achieve this, Index Data defined a session
+ cookie which identifies the backend session.
+
+
+ The cookie is defined by the client and is sent as part of the
+ Initialize Request and passed in an
+ otherInfo
+ element with OID 1.2.840.10003.10.1000.81.2.
+
+
+ Clients that do not send a cookie as part of the initialize request
+ may still better performance, since the init handshake is saved.
+
+
+
+
+ Query Caching
+
+ Simple stateless clients often send identical Z39.50 searches
+ in a relatively short period of time (e.g. in order to produce a
+ results-list page, the next page,
+ a single full-record, etc). And for many targets, it's
+ much more expensive to produce a new result set than to
+ reuse an existing one.
+
+
+ The proxy tries to solve that by remembering the last query for each
+ backend target, so that if an identical query is received next, it
+ is turned into Present Requests rather than new Search Requests.
+
+
+
+ In a future we release will will probably allows for
+ an arbitrary-sized cache for targets supporting named result sets.
+
+
+
+ You can enable/disable query caching using option -o.
+
+
+
+
+ Record Caching
+
+ As an option, the proxy may also cache result set records for the
+ last search.
+ The proxy takes into account the Record Syntax and CompSpec.
+ The CompSpec includes simple element set names as well.
+ By default the cache is 200000 bytes per session.
+
+
+
+
+ Query Validation
+
+ The Proxy may also be configured to trap particular attributes in
+ Type-1 queries and send Bib-1 diagnostics back to the client without
+ even consulting the backend target. This facility may be useful if
+ a target does not properly issue diagnostics when unsupported attributes
+ are send to it.
+
+
+
+
+ Record Syntax Validation
+
+ The proxy may be configured to accept, reject or convert records.
+ When accepted, the target passes search/present requests to the
+ backend target under the assumption that the target can honor the
+ request (In fact it may not do that). When a record is rejected because
+ the record syntax is "unsupported" the proxy returns a diagnostic to the
+ client. Finally, the proxy may convert records.
+
+
+ In the current version the only supported conversion is
+ MARC21/USMARC in MARC-8 charset to MARCXML in UTF-8. Future version of
+ the proxy may do other record/charset conversions.
+
+
+
+
+ Other Optimizations
+
+ We've had some plans to support global caching of result set records,
+ but this has not yet been implemented.
+
+
+
+
+ Proxy Configuration File
+
+ The Proxy as an option may read a configuration file using option
+ -c followed by the filename of a config file.
+
+
+ The config file is in XML format. The YAZ proxy must be compiled
+ with libxml2 and
+ libXSLT support in
+ order for the config file facility to be enabled.
+
+
+ To check for a config file to be well-formed, the yaz-proxy may
+ be invoked without specifying a listening port, i.e.
+
+ yaz-proxy -c myconfig.xml
+
+ If this does not produce errors, the file is well-formed.
+
+
+
+ Proxy Configuration Header
+
+ The proxy config file must have a root element called
+ proxy. All information except an optional XML
+ header must be stored within the proxy element.
+
+
+ <?xml version="1.0"?>
+ <proxy>
+ <!-- content here .. -->
+ </proxy>
+
+
+
+ Configuration: target
+
+ The element target which may be repeated zero
+ or more times with parent element proxy contains
+ information about each backend target.
+ The target element have two attributes:
+ name which holds the logical name of the backend
+ target (required) and default (optional) which
+ (when given) specifies that the backend target is the default target -
+ equivalent to command line option -t.
+
+
+
+ <?xml version="1.0"?>
+ <proxy>
+ <target name="server1" default="1">
+ <!-- description of server1 .. -->
+ </target>
+ <target name="server2">
+ <!-- description of server2 .. -->
+ </target>
+ </proxy>
+
+
+
+
+ Configuration:url
+
+ The url which may be repeated one or more times
+ should be the child of the target element.
+ The CDATA of url is the Z-URL of the backend.
+
+
+ Multiple url element may be used. In that case, then
+ a client initiates a session, the proxy chooses the URL with the lowest
+ number of active sessions, thereby distributing the load. It is
+ assumed that each URL represents the same database (data).
+
+
+
+ Configuration: keepalive
+ The keepalive element holds information about
+ the keepalive Z39.50 sessions. Keepalive sessions are proxy-to-backend
+ sessions that is no longer associated with a client session.
+
+ The keepalive element which is the child of
+ the targetholds two elements:
+ bandwidth and pdu.
+ The bandwidth is the maximum total bytes
+ transferred to/from the target. If a target session exceeds this
+ limit, it is shut down (and no longer kept alive).
+ The pdu is the maximum number of requests sent
+ to the target. If a target session exceeds this limit, it is
+ shut down. The idea of these two limits is that avoid very long
+ sessions that use resources in a backend (that leaks!).
+
+
+ The following sets maximum number of bytes transferred in a
+ target session to 1 MB and maxinum of requests to 400.
+
+ <keepalive>
+ <bandwidth>1048576</bandwidth>
+ <retrieve>400</retrieve>
+ </keepalive>
+
+
+
+
+ Configuration: limit
+
+ The limit section specifies bandwidth/pdu requests
+ limits for an active session.
+ The proxy records bandwidth/pdu requests during the last 60 seconds
+ (1 minute). The limit may include the
+ elements bandwidth, pdu,
+ and retrieve. The bandwidth
+ measures the number of bytes transferred within the last minute.
+ The pdu is the number of requests in the last
+ minute. The retrieve holds the maximum records to
+ be retrieved in one Present Request.
+
+
+ If a bandwidth/pdu limit is reached the proxy will postpone the
+ requests to the target and wait one or more seconds. The idea of the
+ limit is to ensure that clients that downloads hundreds or thousands of
+ records do not hurt other users.
+
+
+ The following sets maximum number of bytes transferred per minute to
+ 500Kbytes and maximum number of requests to 40.
+
+ <limit>
+ <bandwidth>524288</bandwidth>
+ <retrieve>40</retrieve>
+ </limit>
+
+
+
+
+ Typically the limits for keepalive are much higher than
+ those for session minute average.
+
+
+
+
+
+ Configuration: attribute
+
+ The attribute element specifies accept or reject
+ or a particular attribute type, value pair.
+ Well-behaving targets will reject unsupported attributes on their
+ own. This feature is useful for targets that do not gracefully
+ handle unsupported attributes.
+
+
+ Attribute elements may be repeated. The proxy inspects the attribute
+ specifications in the order as specified in the configuration file.
+ When a given attribute specification matches a given attribute list
+ in a query, the proxy takes appropriate action (reject, accept).
+
+
+ If no attribute specifications matches the attribute list in a query,
+ it is accepted.
+
+
+ The attribute element has two required attributes:
+ type which is the Attribute Type-1 type, and
+ value which is the Attribute Type-1 value.
+ The special value/type * matches any attribute
+ type/value. A value may also be specified as a list with each
+ value separated by comma, a value may also be specified as a
+ list: low value - dash - high value.
+
+
+ If attribute error is given, that holds a
+ Bib-1 diagnostic which is sent to the client if the particular
+ type, value is part of a query.
+
+
+ If attribute error is not given, the attribute
+ type, value is accepted and passed to the backend target.
+
+
+ A target that supports use attributes 1,4, 1000 through 1003 and
+ no other use attributes, could use the following rules:
+
+ <attribute type="1" value="1,4,1000-1003">
+ <attribute type="1" value="*" error="114"/>
+
+
+
+
+
+ Configuration: syntax
+
+ The syntax element specifies accept or reject
+ or a particular record syntax request from the client.
+
+
+ The syntax has one required attribute:
+ type which is the Preferred Record Syntax.
+
+
+ If attribute error is given, that holds a
+ Bib-1 diagnostic which is sent to the client if the particular
+ record syntax is part of a present - or search request.
+
+
+ If attribute error is not given, the record syntax
+ is accepted and passed to the backend target.
+
+
+ If attribute marcxml is given, the proxy will
+ perform MARC21 to MARCXML conversion. In this case the
+ type should be XML. The proxy will use
+ preferred record syntax USMARC/MARC21 against the backend target.
+
+ To accept USMARC and offer MARCXML XML records but reject
+ all other requests the following configuration could be used:
+
+ <proxy>
+ <target name="mytarget">
+ <syntax type="usmarc"/>
+ <syntax type="xml" marcxml="1"/>
+ <syntax type="*" error="238"/>
+ </target>
+ </proxy>
+
+
+
+
+
+ Configuration: target-timeout
+
+ The element target-timeout is the child of element
+ target and specifies the amount in seconds before
+ a target session is shut down.
+
+
+ This can also be specified on the command line by using option
+ -T. Refer to .
+
+
+
+
+ Configuration: client-timeout
+
+ The element client-timeout is the child of element
+ target and specifies the amount in seconds before
+ a client session is shut down.
+
+
+ This can also be specified on the command line by using option
+ -i. Refer to .
+
+
+
+
+ Configuration: preinit
+
+ The element preinit is the child of element
+ target and specifies the number of spare
+ connection to a target. By default no spare connection are
+ created by the proxy. If the proxy uses a target exclusive or
+ a lot, the preinit session will ensure that target sessions
+ have been made before the client makes a connection and will therefore
+ reduce the connect-init handshake dramatically. Never set this to
+ more than 5.
+
+
+
+
+ Configuration: max-clients
+
+ The element max-clients is the child of element
+ proxy and specifies the total number of
+ allowed connections to targets (all targets). If this limit
+ is reached the proxy will close the least recently used connection.
+
+
+ Note, that many Unix systems impose a system on the number of
+ open files allowed in a single process, typically in the
+ range 256 (Solaris) to 1024 (Linux).
+ The proxy uses 2 sockets per session + a few files
+ for logging. As a rule of thumb, ensure that 2*max-clients + 5
+ can be opened by the proxy process.
+
+
+
+ Using the
+ bash shell, you can set the limit with
+ ulimit -nno.
+ Use ulimit -a to display limits.
+
+
+
+
+
+ Configuration: log
+
+ The element log is the child of element
+ proxy and specifies what to be logged by the
+ proxy.
+
+
+ Specify the log file with command-line option -l.
+
+
+ The text of the log element is a sequence of
+ options separated by white space. See the table below:
+
Logging options
+
+
+
+
+
+ Option
+ Description
+
+
+
+
+ client-apdu
+
+ Log APDUs as reported by YAZ for the
+ communication between the client and the proxy.
+ This facility is equivalent to the APDU logging that
+ happens when using option -a, however
+ this tells the proxy to log in the same file as given
+ by -l.
+
+
+
+ server-apdu
+
+ Log APDUs as reported by YAZ for the
+ communication between the proxy and the server (backend).
+
+
+
+ clients-requests
+
+ Log a brief description about requests transferred between
+ the client and the proxy. The name of the request and the size
+ of the APDU is logged.
+
+
+
+ server-requests
+
+ Log a brief description about requests transferred between
+ the proxy and the server (backend). The name of the request
+ and the size of the APDU is logged.
+
+
+
+
+
+
+
+ To log communication in details between the proxy and the backend, th
+ following configuration could be used:
+
+ server-apdu server-requests
+
+]]>
+
+
+
+
+
+
+ Proxy Usage
+
+
+
+ &yaz-proxy-ref;
+
+
+ OtherInformation Encoding
+
+ The proxy uses the OtherInformation definition to carry
+ information about the target address and cookie.
+
+
+ OtherInformation ::= [201] IMPLICIT SEQUENCE OF SEQUENCE{
+ category [1] IMPLICIT InfoCategory OPTIONAL,
+ information CHOICE{
+ characterInfo [2] IMPLICIT InternationalString,
+ binaryInfo [3] IMPLICIT OCTET STRING,
+ externallyDefinedInfo [4] IMPLICIT EXTERNAL,
+ oid [5] IMPLICIT OBJECT IDENTIFIER}}
+--
+ InfoCategory ::= SEQUENCE{
+ categoryTypeId [1] IMPLICIT OBJECT IDENTIFIER OPTIONAL,
+ categoryValue [2] IMPLICIT INTEGER}
+
+
+ The categoryTypeId is either
+ OID 1.2.840.10003.10.1000.81.1, 1.2.840.10003.10.1000.81.2
+ for proxy target and proxy cookie respectively. The
+ integer element category is set to 0.
+ The value proxy and cookie is stored in element
+ characterInfo of the information
+ choice.
+
+
+
+