X-Git-Url: http://sru.miketaylor.org.uk/?a=blobdiff_plain;ds=sidebyside;f=doc%2Fbook.xml;h=4eeedf99b8e7dacc5670ac35f11cfc3fc4dc6d89;hb=29efb3fee4b0659a8a50719a2699a1c1720f9b4b;hp=d0e4cb21137104217d69fefbaffd09cf53329463;hpb=da405c16252ce5e47f69fe153e29e3864766da0a;p=metaproxy-moved-to-github.git
diff --git a/doc/book.xml b/doc/book.xml
index d0e4cb2..4eeedf9 100644
--- a/doc/book.xml
+++ b/doc/book.xml
@@ -1,40 +1,196 @@
-
+
Metaproxy - User's Guide and ReferenceMikeTaylor
-
- AdamDickmeiss
-
-
- 2006
- Index Data
-
+
+ AdamDickmeiss
+
+
+ 2006
+ Index Data
+
- Metaproxy - mangler of Z39.50/SRU operations.
+ Metaproxy is a universal router, proxy and encapsulated
+ metasearcher for information retrieval protocols. It accepts,
+ processes, interprets and redirects requests from IR clients using
+ standard protocols such as ANSI/NISO Z39.50 (and in the future SRU
+ and SRW), as well as functioning as a limited
+ HTTP server. Metaproxy is configured by an XML file which
+ specifies how the software should function in terms of routes that
+ the request packets can take through the proxy, each step on a
+ route being an instantiation of a filter. Filters come in many
+ types, one for each operation: accepting Z39.50 packets, logging,
+ query transformation, multiplexing, etc. Further filter-types can
+ be added as loadable modules to extend Metaproxy functionality,
+ using the filter API.
+
+
+ The terms under which Metaproxy will be distributed have yet to be
+ established, but it will not necessarily be open source; so users
+ should not at this stage redistribute the code without explicit
+ written permission from the copyright holders, Index Data ApS.
-
+
Introduction
-
- OverviewMetaproxy
- is ..
+ is a standalone program that acts as a universal router, proxy and
+ encapsulated metasearcher for information retrieval protocols such
+ as Z39.50, and in the future SRU and SRW. To clients, it acts as a
+ server of these
+ protocols: it can be searched, records can be retrieved from it,
+ etc. To servers, it acts as a client: it searches in them,
+ retrieves records from them, etc. it satisfies its clients'
+ requests by transforming them, multiplexing them, forwarding them
+ on to zero or more servers, merging the results, transforming
+ them, and delivering them back to the client. In addition, it
+ acts as a simple HTTP server; support for further protocols can be
+ added in a module fashion, through the creation of new filters.
+
+ Anything goes in!
+ Anything goes out!
+ Cold bananas, fish, pyjamas,
+ Mutton, beef and trout!
+ - attributed to Cole Porter.
+
- ### We should probably consider saying a little more by way of
- introduction.
+ Metaproxy is a more capable alternative to
+ YAZ Proxy,
+ being more powerful, flexible, configurable and extensible. Among
+ its many advantages over the older, more pedestrian work are
+ support for multiplexing (encapsulated metasearching), routing by
+ database name, authentication and authorisation and serving local
+ files via HTTP. Equally significant, its modular architecture
+ facilitites the creation of pluggable modules implementing further
+ functionality.
-
-
+
+
+
+
+ The Metaproxy Licence
+
+
+ No decision has yet been made on the terms under which
+ Metaproxy will be distributed.
+
+ It is possible that, unlike
+ other Index Data products, metaproxy may not be released under a
+ free-software licence such as the GNU GPL. Until a decision is
+ made and a public statement made, then, and unless it has been
+ delivered to you other specific terms, please treat Metaproxy as
+ though it were proprietary software.
+ The code should not be redistributed without explicit
+ written permission from the copyright holders, Index Data ApS.
+
+
+
+
+
+
+ The Metaproxy Architecture
+
+ The Metaproxy architecture is based on three concepts:
+ the package,
+ the route
+ and the filter.
+
+
+
+ Packages
+
+
+ A package is request or response, encoded in some protocol,
+ issued by a client, making its way through Metaproxy, send to or
+ received from a server, or sent back to the client.
+
+
+ The core of a package is the protocol unit - for example, a
+ Z39.50 Init Request or Search Response, or an SRU searchRetrieve
+ URL or Explain Response. In addition to this core, a package
+ also carries some extra information added and used by Metaproxy
+ itself.
+
+
+ In general, packages are doctored as they pass through
+ Metaproxy. For example, when the proxy performs authentication
+ and authorisation on a Z39.50 Init request, it removes the
+ authentication credentials from the package so that they are not
+ passed onto the back-end server; and when search-response
+ packages are obtained from multiple servers, they are merged
+ into a single unified package that makes its way back to the
+ client.
+
+
+
+
+ Routes
+
+
+ Packages make their way through routes, which can be thought of
+ as programs that operate on the package data-type. Each
+ incoming package initially makes its way through a default
+ route, but may be switched to a different route based on various
+ considerations. Routes are made up of sequences of filters (see
+ below).
+
+
+
+
+ Filters
+
+
+ Filters provide the individual instructions within a route, and
+ effect the necessary transformations on packages. A particular
+ configuration of Metaproxy is essentially a set of filters,
+ described by configuration details and arranged in order in one
+ or more routes. There are many kinds of filter - about a dozen
+ at the time of writing with more appearing all the time - each
+ performing a specific function and configured by different
+ information.
+
+
+ The word ``filter'' is sometimes used rather loosely, in two
+ different ways: it may be used to mean a particular
+ type of filter, as when we speak of ``the
+ auth_simplefilter'' or ``the multi filter''; or it may be used
+ to be a specific instance of a filter
+ within a Metaproxy configuration. For example, a single
+ configuration will often contain multiple instances of the
+ z3950_client filter. In
+ operational terms, of these is a separate filter. In practice,
+ context always make it clear which sense of the word ``filter''
+ is being used.
+
+
+ Extensibility of Metaproxy is primarily through the creation of
+ plugins that provide new filters. The filter API is small and
+ conceptually simple, but there are many details to master. See
+ the section below on
+ extensions.
+
+
+
+
+
+ Since packages are created and handled by the system itself, and
+ routes are conceptually simple, most of the remainder of this
+ document concentrates on filters. After a brief overview of the
+ filter types follows, along with some thoughts on possible future
+ directions.
+
+
+
@@ -49,7 +205,7 @@
complex data type, namely the ``package''.
- A package represents a Z39.50 or SRW/U request (whether for Init,
+ A package represents a Z39.50 or SRU/W request (whether for Init,
Search, Scan, etc.) together with information about where it came
from. Packages are created by front-end filters such as
frontend_net (see below), which reads them from
@@ -61,7 +217,7 @@
There are many kinds of filter: some that are defined statically
- as part of Metaproxy, and other that may be provided by third parties
+ as part of Metaproxy, and others may be provided by third parties
and dynamically loaded. They all conform to the same simple API
of essentially two methods: configure() is
called at startup time, and is passed a DOM tree representing that
@@ -84,6 +240,7 @@
(auth_simple,
log,
multi,
+ query_rewrite,
session_shared,
template,
virt_db).
@@ -92,13 +249,25 @@
- Individual filters
+ Overview of filter types
+
+ We now briefly consider each of the types of filter supported by
+ the core Metaproxy binary. This overview is intended to give a
+ flavour of the available functionality; more detailed information
+ about each type of filter is included below in the Module
+ Reference.
+
The filters are here named by the string that is used as the
type attribute of a
<filter> element in the configuration
file to request them, with the name of the class that implements
- them in parentheses.
+ them in parentheses. (The classname is not needed for normal
+ configuration and use of Metaproxy; it is useful only to
+ developers.)
+
+
+ The filters are here listed in alphabetical order:
@@ -110,10 +279,13 @@
lists username:password
pairs, one per line, colon separated. When a session begins, it
is rejected unless username and passsword are supplied, and match
- a pair in the register.
-
-
- ### discuss authorisation phase
+ a pair in the register. The configuration file may also specific
+ the name of another file that is the target register: this lists
+ lists username:dbname,dbname...
+ sets, one per line, with multiple database names separated by
+ commas. When a search is processed, it is rejected unless the
+ database to be searched is one of those listed as available to
+ the user.
@@ -123,7 +295,8 @@
A sink that provides dummy responses in the manner of the
yaz-ztest Z39.50 server. This is useful only
- for testing.
+ for testing. Seriously, you don't need this. Pretend you didn't
+ even read this section.
@@ -131,10 +304,10 @@
frontend_net
(mp::filter::FrontendNet)
- A source that accepts Z39.50 and SRW connections from a port
+ A source that accepts Z39.50 connections from a port
specified in the configuration, reads protocol units, and
- feeds them into the next filter, eventually returning the
- result to the origin.
+ feeds them into the next filter in the route. When the result is
+ revceived, it is returned to the original origin.
@@ -163,8 +336,23 @@
multi
(mp::filter::Multi)
- Performs multicast searching. See the extended discussion of
- multi-database searching below.
+ Performs multicast searching.
+ See
+ the extended discussion
+ of virtual databases and multi-database searching below.
+
+
+
+
+ query_rewrite
+ (mp::filter::QueryRewrite)
+
+ Rewrites Z39.50 Type-1 and Type-101 (``RPN'') queries by a
+ three-step process: the query is transliterated from Z39.50
+ packet structures into an XML representation; that XML
+ representation is transformed by an XSLT stylesheet; and the
+ resulting XML is transliterated back into the Z39.50 packet
+ structure.
@@ -174,8 +362,16 @@
When this is finished, it will implement global sharing of
result sets (i.e. between threads and therefore between
- clients), but it's not yet done.
+ clients), yielding performance improvements especially when
+ incoming requests are from a stateless environment such as a
+ web-server, in which the client process representing a session
+ might be any one of many. However:
+
+
+ This filter is not yet completed.
+
+
@@ -186,7 +382,8 @@
should be called nop or
passthrough?) This exists not to be used, but
to be copied - to become the skeleton of new filters as they are
- written.
+ written. As with backend_test, this is not
+ intended for civilians.
@@ -194,8 +391,14 @@
virt_db
(mp::filter::Virt_db)
- Performs virtual database selection. See the extended discussion
- of virtual databases below.
+ Performs virtual database selection: based on the name of the
+ database in the search request, a server is selected, and its
+ address added to the request in a VAL_PROXY
+ otherInfo packet. It will subsequently be used by a
+ z3950_client filter.
+ See
+ the extended discussion
+ of virtual databases and multi-database searching below.
@@ -220,7 +423,8 @@
Some other filters that do not yet exist, but which would be
useful, are briefly described. These may be added in future
- releases.
+ releases (or may be created by third parties, as loadable
+ modules).
@@ -233,19 +437,19 @@
- srw2z3950 (filter)
+ frontend_sru (source)
- Translate SRW requests into Z39.50 requests.
+ Receive SRU (and perhaps SRW) requests.
- srw_client (sink)
+ sru2z3950 (filter)
- SRW searching and retrieval.
-
+ Translate SRU requests into Z39.50 requests.
+
@@ -257,6 +461,14 @@
+ srw_client (sink)
+
+
+ SRW searching and retrieval.
+
+
+
+ opensearch_client (sink)
@@ -270,6 +482,32 @@
+
+ Virtual databases and multi-database searching
+
+
+
+ Introductory notes
+
+ Two of Metaproxy's filters are concerned with multiple-database
+ operations. Of these, virt_db can work alone
+ to control the routing of searches to one of a number of servers,
+ while multi can work with the output of
+ virt_db to perform multicast searching, merging
+ the results into a unified result-set. The interaction between
+ these two filters is necessarily complex, reflecting the real
+ complexity of multicast searching in a protocol such as Z39.50
+ that separates initialisation from searching, with the database to
+ search known only during the latter operation.
+
+
+ ### Much, much more to say!
+
+
+
+
+
+
Configuration: the Metaproxy configuration file format
@@ -430,6 +668,15 @@
+ query_rewrite
+
+ <filter type="query_rewrite">
+ <xslt>pqf2pqf.xsl</xslt>
+ </filter>
+
+
+
+ session_shared
<filter type="session_shared">
@@ -474,30 +721,6 @@
-
- Virtual database as multi-database searching
-
-
-
- Introductory notes
-
- Two of Metaproxy's filters are concerned with multiple-database
- operations. Of these, virt_db can work alone
- to control the routing of searches to one of a number of servers,
- while multi can work with the output of
- virt_db to perform multicast searching, merging
- the results into a unified result-set. The interaction between
- these two filters is necessarily complex, reflecting the real
- complexity of multicast searching in a protocol such as Z39.50
- that separates initialisation from searching, with the database to
- search known only during the latter operation.
-
-
- ### Much, much more to say!
-
-
-
-
Module Reference
@@ -506,6 +729,11 @@
&manref;
+
+ Writing extensions for Metaproxy
+ ###
+
+
Classes in the Metaproxy source code
@@ -783,5 +1011,6 @@
sgml-parent-document: "main.xml"
sgml-local-catalogs: nil
sgml-namecase-general:t
+ nxml-child-indent: 1
End:
-->