X-Git-Url: http://sru.miketaylor.org.uk/?a=blobdiff_plain;f=doc%2Fbook.xml;h=5ca2e6260a761d6897c3c43df546acf35b62f1b3;hb=ea632b5255eff5fa23797126df76542ac8720d94;hp=4eeedf99b8e7dacc5670ac35f11cfc3fc4dc6d89;hpb=29efb3fee4b0659a8a50719a2699a1c1720f9b4b;p=metaproxy-moved-to-github.git
diff --git a/doc/book.xml b/doc/book.xml
index 4eeedf9..5ca2e62 100644
--- a/doc/book.xml
+++ b/doc/book.xml
@@ -1,4 +1,4 @@
-
+
Metaproxy - User's Guide and Reference
@@ -9,16 +9,20 @@
2006
- Index Data
+ Index Data ApS
Metaproxy is a universal router, proxy and encapsulated
metasearcher for information retrieval protocols. It accepts,
processes, interprets and redirects requests from IR clients using
- standard protocols such as ANSI/NISO Z39.50 (and in the future SRU
- and SRW), as well as functioning as a limited
- HTTP server. Metaproxy is configured by an XML file which
+ standard protocols such as
+ ANSI/NISO Z39.50
+ (and in the future SRU
+ and SRW), as
+ well as functioning as a limited
+ HTTP server.
+ Metaproxy is configured by an XML file which
specifies how the software should function in terms of routes that
the request packets can take through the proxy, each step on a
route being an instantiation of a filter. Filters come in many
@@ -33,6 +37,16 @@
should not at this stage redistribute the code without explicit
written permission from the copyright holders, Index Data ApS.
+
+
+
+
+
+
+
+
+
+
@@ -40,39 +54,55 @@
Introduction
-
- Metaproxy
- is a standalone program that acts as a universal router, proxy and
- encapsulated metasearcher for information retrieval protocols such
- as Z39.50, and in the future SRU and SRW. To clients, it acts as a
- server of these
- protocols: it can be searched, records can be retrieved from it,
- etc. To servers, it acts as a client: it searches in them,
- retrieves records from them, etc. it satisfies its clients'
- requests by transforming them, multiplexing them, forwarding them
- on to zero or more servers, merging the results, transforming
- them, and delivering them back to the client. In addition, it
- acts as a simple HTTP server; support for further protocols can be
- added in a module fashion, through the creation of new filters.
-
-
- Anything goes in!
- Anything goes out!
- Cold bananas, fish, pyjamas,
- Mutton, beef and trout!
+
+ Metaproxy
+ is a standalone program that acts as a universal router, proxy and
+ encapsulated metasearcher for information retrieval protocols such
+ as Z39.50, and in the future
+ SRU and SRW.
+ To clients, it acts as a server of these protocols: it can be searched,
+ records can be retrieved from it, etc.
+ To servers, it acts as a client: it searches in them,
+ retrieves records from them, etc. it satisfies its clients'
+ requests by transforming them, multiplexing them, forwarding them
+ on to zero or more servers, merging the results, transforming
+ them, and delivering them back to the client. In addition, it
+ acts as a simple HTTP server; support
+ for further protocols can be added in a modular fashion, through the
+ creation of new filters.
+
+
+ Anything goes in!
+ Anything goes out!
+ Cold bananas, fish, pyjamas,
+ Mutton, beef and trout!
- attributed to Cole Porter.
-
-
- Metaproxy is a more capable alternative to
- YAZ Proxy,
- being more powerful, flexible, configurable and extensible. Among
- its many advantages over the older, more pedestrian work are
- support for multiplexing (encapsulated metasearching), routing by
- database name, authentication and authorisation and serving local
- files via HTTP. Equally significant, its modular architecture
- facilitites the creation of pluggable modules implementing further
- functionality.
-
+
+
+ Metaproxy is a more capable alternative to
+ YAZ Proxy,
+ being more powerful, flexible, configurable and extensible. Among
+ its many advantages over the older, more pedestrian work are
+ support for multiplexing (encapsulated metasearching), routing by
+ database name, authentication and authorisation and serving local
+ files via HTTP. Equally significant, its modular architecture
+ facilitites the creation of pluggable modules implementing further
+ functionality.
+
+
+ This manual will briefly describe Metaproxy's licensing situation
+ before giving an overview of its architecture, then discussing the
+ key concept of a filter in some depth and giving an overview of
+ the various filter types, then discussing the configuration file
+ format. After this come several optional chapters which may be
+ freely skipped: a detailed discussion of virtual databases and
+ multi-database searching, some notes on writing extensions
+ (additional filter types) and a high-level description of the
+ source code. Finally comes the reference guide, which contains
+ instructions for invoking the metaproxy
+ program, and detailed information on each type of filter,
+ including examples.
+
@@ -81,8 +111,8 @@
The Metaproxy Licence
- No decision has yet been made on the terms under which
- Metaproxy will be distributed.
+ No decision has yet been made on the terms under which
+ Metaproxy will be distributed.
It is possible that, unlike
other Index Data products, metaproxy may not be released under a
@@ -95,8 +125,134 @@
+
+ Installation
+
+ Metaproxy depends on the folloing tools/libraries:
+
+ YAZ++
+
+
+ This is a C++ library based on YAZ.
+
+
+
+ Libxslt
+
+ This is an XSLT processor - based on
+ Libxml2. Both Libxml2 and
+ Libxslt must be installed with the development components.
+
+
+
+ Boost
+
+
+ The popular C++ library.
+
+
+
+
+
+
+ In order to compile Metaproxy a modern C++ compiler is
+ required. Boost, in particular, requires the C++ compiler
+ to facilitate the newest features. Refer to Boost
+ Compiler Status
+ for more information.
+
+
+ We have succesfully used Metaproxy with Boost using the compilers
+ GCC version 4.0 and
+ Microsoft Visual Studio 2003/2005.
+
+
+
+ Installation on Unix (from Source)
+
+ Here is a quick step-by-step guide on how to compile all the
+ tools that Metaproxy uses. Only few systems have none of the required
+ tools binary packages. If, for example, Libxml2/libxslt are already
+ installed as development packages use those (and omit compilation).
+
+
+
+ Libxml2/libxslt:
+
+
+ gunzip -c libxml2-version.tar.gz|tar xf -
+ cd libxml2-version
+ ./configure
+ make
+ su
+ make install
+
+
+ gunzip -c libxslt-version.tar.gz|tar xf -
+ cd libxslt-version
+ ./configure
+ make
+ su
+ make install
+
+
+ YAZ/YAZ++:
+
+
+ gunzip -c yaz-version.tar.gz|tar xf -
+ cd yaz-version
+ ./configure
+ make
+ su
+ make install
+
+
+ gunzip -c yazpp-version.tar.gz|tar xf -
+ cd yazpp-version
+ ./configure
+ make
+ su
+ make install
+
+
+ Boost:
+
+
+ gunzip -c boost-version.tar.gz|tar xf -
+ cd boost-version
+ ./configure
+ make
+ su
+ make install
+
+
+ Metaproxy:
+
+
+ gunzip -c metaproxy-version.tar.gz|tar xf -
+ cd metaproxy-version
+ ./configure
+ make
+ su
+ make install
+
+
+
+ Installation on Debian
+
+ ### To be written
+
+
+
+ Installation on Windows
+
+ ### To be written
+
+
+
+
The Metaproxy Architecture
@@ -248,14 +404,15 @@
-
+ Overview of filter types
We now briefly consider each of the types of filter supported by
the core Metaproxy binary. This overview is intended to give a
flavour of the available functionality; more detailed information
- about each type of filter is included below in the Module
- Reference.
+ about each type of filter is included below in
+ the reference guide to Metaproxy filters.
The filters are here named by the string that is used as the
@@ -418,7 +575,7 @@
-
+ Future directions
Some other filters that do not yet exist, but which would be
@@ -482,32 +639,6 @@
-
- Virtual databases and multi-database searching
-
-
-
- Introductory notes
-
- Two of Metaproxy's filters are concerned with multiple-database
- operations. Of these, virt_db can work alone
- to control the routing of searches to one of a number of servers,
- while multi can work with the output of
- virt_db to perform multicast searching, merging
- the results into a unified result-set. The interaction between
- these two filters is necessarily complex, reflecting the real
- complexity of multicast searching in a protocol such as Z39.50
- that separates initialisation from searching, with the database to
- search known only during the latter operation.
-
-
- ### Much, much more to say!
-
-
-
-
-
-
Configuration: the Metaproxy configuration file format
@@ -519,7 +650,9 @@
its configuration file can be thought of as a program for that
interpreter. Configuration is by means of a single file, the name
of which is supplied as the sole command-line argument to the
- yp2 program.
+ metaproxy program. (See
+ the reference guide
+ below for more information on invoking Metaproxy.)
The configuration files are written in XML. (But that's just an
@@ -544,7 +677,7 @@
-
+ Overview of XML structure
All elements and attributes are in the namespace
@@ -565,15 +698,19 @@
The <start> element is empty, but carries a
route attribute, whose value is the name of
- route at which to start running - analogouse to the name of the
+ route at which to start running - analogous to the name of the
start production in a formal grammar.
If present, <filters> contains zero or more <filter>
- elements; filters carry a type attribute and
- contain various elements that provide suitable configuration for
- filters of that type. The filter-specific elements are described
- below. Filters defined in this part of the file must carry an
+ elements. Each filter carries a type attribute
+ which specifies what kind of filter is being defined
+ (frontend_net, log, etc.)
+ and contain various elements that provide suitable configuration
+ for a filter of its type. The filter-specific elements are
+ described in
+ the reference guide below.
+ Filters defined in this part of the file must carry an
id attribute so that they can be referenced
from elsewhere.
@@ -589,151 +726,183 @@
<filters> section. Alternatively, a route within a filter
may omit the refid attribute, but contain
configuration elements similar to those used for filters defined
- in the <filters> section.
+ in the <filters> section. (In other words, each filter in a
+ route may be included either by reference or by physical
+ inclusion.)
-
- Filter configuration
+
+ An example configuration
- All <filter> elements have in common that they must carry a
- type attribute whose value is one of the
- supported ones, listed in the schema file and discussed below. In
- additional, <filters>s occurring the <filters> section
- must have an id attribute, and those occurring
- within a route must have either a refid
- attribute referencing a previously defined filter or contain its
- own configuration information.
+ The following is a small, but complete, Metaproxy configuration
+ file (included in the distribution as
+ metaproxy/etc/config0.xml).
+ This file defines a very simple configuration that simply proxies
+ to whatever backend server the client requests, but logs each
+ request and response. This can be useful for debugging complex
+ client-server dialogues.
+
+
+
+
+
+ @:9000
+
+
+
+
+
+
+
+
+
+
+
+
+]]>
- In general, each filter recognises different configuration
- elements within its element, as each filter has different
- functionality. These are as follows:
+ It works by defining a single route, called
+ start, which consists of a sequence of three
+ filters. The first and last of these are included by reference:
+ their <filter> elements have
+ refid attributes that refer to filters defined
+ within the prior <filters> section. The
+ middle filter is included inline in the route.
+
+ The three filters in the route are as follows: first, a
+ frontend_net filter accepts Z39.50 requests
+ from any host on port 9000; then these requests are passed through
+ a log filter that emits a message for each
+ request; they are then fed into a z3950_client
+ filter, which forwards the requests to the client-specified
+ backend Z39.509 server. When the response arrives, it is handed
+ back to the log filter, which emits another
+ message; and then to the front-end filter, which returns the
+ response to the client.
+
+
+
-
- auth_simple
-
- <filter type="auth_simple">
- <userRegister>../etc/example.simple-auth</userRegister>
- </filter>
-
-
-
-
- backend_test
-
- <filter type="backend_test"/>
-
-
-
-
- frontend_net
-
- <filter type="frontend_net">
- <threads>10</threads>
- <port>@:9000</port>
- </filter>
-
-
-
-
- http_file
-
- <filter type="http_file">
- <mimetypes>/etc/mime.types</mimetypes>
- <area>
- <documentroot>.</documentroot>
- <prefix>/etc</prefix>
- </area>
- </filter>
-
-
-
-
- log
-
- <filter type="log">
- <message>B</message>
- </filter>
-
-
-
-
- multi
-
- <filter type="multi"/>
-
-
-
-
- query_rewrite
-
- <filter type="query_rewrite">
- <xslt>pqf2pqf.xsl</xslt>
- </filter>
-
-
-
- session_shared
-
- <filter type="session_shared">
- ### Not yet defined
- </filter>
-
-
-
- template
-
- <filter type="template"/>
-
-
+
+ Virtual databases and multi-database searching
-
- virt_db
-
- <filter type="virt_db">
- <virtual>
- <database>loc</database>
- <target>z3950.loc.gov:7090/voyager</target>
- </virtual>
- <virtual>
- <database>idgils</database>
- <target>indexdata.dk/gils</target>
- </virtual>
- </filter>
-
-
-
- z3950_client
-
- <filter type="z3950_client">
- <timeout>30</timeout>
- </filter>
-
-
+
+ Introductory notes
+
+ Lark's vomit
+
+ This chapter goes into a level of technical detail that is
+ probably not necessary in order to configure and use Metaproxy.
+ It is provided only for those who like to know how things work.
+ You should feel free to skip on to the next section if this one
+ doesn't seem like fun.
+
+
+
+ Two of Metaproxy's filters are concerned with multiple-database
+ operations. Of these, virt_db can work alone
+ to control the routing of searches to one of a number of servers,
+ while multi can work with the output of
+ virt_db to perform multicast searching, merging
+ the results into a unified result-set. The interaction between
+ these two filters is necessarily complex: it reflecting the real,
+ irreducible complexity of multicast searching in a protocol such
+ as Z39.50 that separates initialisation from searching, and in
+ which the database to be searched is not known at initialisation
+ time.
+
+
+ Hold on tight - this may get a little hairy.
+
+
+ In the general course of things, a Z39.50 Init request may carry
+ with it an otherInfo packet of type VAL_PROXY,
+ whose value indicates the address of a Z39.50 server to which the
+ ultimate connection is to be made. (This otherInfo packet is
+ supported by YAZ-based Z39.50 clients and servers, but has not yet
+ been ratified by the Maintenance Agency and so is not widely used
+ in non-Index Data software. We're working on it.)
+ The VAL_PROXY packet functions
+ analogously to the absoluteURI-style Request-URI used with the GET
+ method when a web browser asks a proxy to forward its request: see
+ the
+ Request-URI
+ section of
+ the HTTP 1.1 specification.
+
+
+ The role of the virt_db filter is to rewrite
+ this otherInfo packet dependent on the virtual database that the
+ client wants to search. For example, a virt_db
+ filter could be set up so that searches in the virtual database
+ ``lc'' are forwarded to the Library of Congress server, and
+ searches in the virtual database ``id'' are forwarded to the toy
+ GILS database that Index Data hosts for testing purposes. A
+ virt_db configuration to make this switch would
+ look like this:
+
+
+
+ lc
+ z3950.loc.gov:7090/Voyager
+
+
+ id
+ indexdata.dk/gils
+
+ ]]>
+
+ When Metaproxy receives a Z39.50 Init request from a client, it
+ doesn't immediately forward that request to the back-end server.
+ Why not? Because it doesn't know which
+ back-end server to forward it to until the client sends a search
+ request that specifies the database that it wants to search in.
+ Instead, it just treasures the Init request up in its heart; and,
+ later, the first time the client does a search on one of the
+ specified virtual databases, a connection is forged to the
+ appropriate server and the Init request is forwarded to it. If,
+ later in the session, the same client searches in a different
+ virtual database, then a connection is forged to the server that
+ hosts it, and the same cached Init request is forwarded there,
+ too.
+
+
+ All of this clever Init-delaying is done by the
+ frontend_net filter. The
+ virt_db filter knows nothing about it; in
+ fact, because the Init request that is received from the client
+ doesn't get forwarded until a Search reqeust is received, the
+ virt_db filter (and the
+ z3950_client filter behind it) doesn't even get
+ invoked at Init time. The only thing that a
+ virt_db filter ever does is rewrite the
+ VAL_PROXY otherInfo in the requests that pass
+ through it.
+
-
- Module Reference
-
- The material in this chapter includes the man pages material
-
- &manref;
-
-
Writing extensions for Metaproxy
- ###
+ ### To be written
+
+
+
Classes in the Metaproxy source code
@@ -742,7 +911,18 @@
Introductory notesStop! Do not read this!
- You won't enjoy it at all.
+ You won't enjoy it at all. You should just skip ahead to
+ the reference guide,
+ which tells
+
+ you things you really need to know, like the fact that the
+ fabulously beautiful planet Bethselamin is now so worried about
+ the cumulative erosion by ten billion visiting tourists a year
+ that any net imbalance between the amount you eat and the amount
+ you excrete whilst on the planet is surgically removed from your
+ bodyweight when you leave: so every time you go to the lavatory it
+ is vitally important to get a receipt.
This chapter contains documentation of the Metaproxy source code, and is
@@ -765,7 +945,7 @@
-
+ Individual classes
The classes making up the Metaproxy application are here listed by
@@ -798,7 +978,7 @@
structures, which are listed in its constructor. Merely
instantiating this class registers all the static classes. It is
for the benefit of this class that struct
- yp2_filter_struct exists, and that all the filter
+ metaproxy_1_filter_struct exists, and that all the filter
classes provide a static object of that type.
@@ -892,7 +1072,7 @@
mp::RouterChain
(router_chain.cpp)
- ###
+ ### to be written
@@ -900,7 +1080,7 @@
mp::RouterFleXML
(router_flexml.cpp)
- ###
+ ### to be written
@@ -908,7 +1088,7 @@
mp::Session
(session.cpp)
- ###
+ ### to be written
@@ -916,7 +1096,7 @@
mp::ThreadPoolSocketObserver
(thread_pool_observer.cpp)
- ###
+ ### to be written
@@ -942,7 +1122,7 @@
-
+ Other Source Files
In addition to the Metaproxy source files that define the classes
@@ -954,7 +1134,7 @@
metaproxy_prog.cpp
- The main function of the yp2 program.
+ The main function of the metaproxy program.
@@ -982,23 +1162,36 @@
plainfile.cpp,
tstdl.cpp.
-
-
-
-
- --
-
-
-
-
-
-
-
-
+
+
+
+ Reference guide
+
+ The material in this chapter is drawn directly from the individual
+ manual entries. In particular, the Metaproxy invocation section is
+ available using man metaproxy, and the section
+ on each individual filter is available using the name of the filter
+ as the argument to the man command.
+
+
+
+
+ Metaproxy invocation
+ &progref;
+
+
+
+
+ Reference guide to Metaproxy filters
+ &manref;
+
+
+
+
+