+
+ <section id="multidb.virt_db">
+ <title>Virtual databases with the <literal>virt_db</literal> filter</title>
+ <para>
+ Working alone, the purpose of the
+ <literal>virt_db</literal>
+ filter is to route search requests to one of a selection of
+ back-end databases. In this way, a single Z39.50 endpoint
+ (running Metaproxy) can provide access to several different
+ underlying services, including those that would otherwise be
+ inaccessible due to firewalls. In many useful configurations, the
+ back-end databases are local to the Metaproxy installation, but
+ the software does not enforce this, and any valid Z39.50 servers
+ may be used as back-ends.
+ </para>
+ <para>
+ For example, a <literal>virt_db</literal>
+ filter could be set up so that searches in the virtual database
+ ``lc'' are forwarded to the Library of Congress bibliographic
+ catalogue server, and searches in the virtual database ``marc''
+ are forwarded to the toy database of MARC records that Index Data
+ hosts for testing purposes. A <literal>virt_db</literal>
+ configuration to make this switch would look like this:
+ </para>
+ <screen><![CDATA[<filter type="virt_db">
+ <virtual>
+ <database>lc</database>
+ <target>z3950.loc.gov:7090/voyager</target>
+ </virtual>
+ <virtual>
+ <database>marc</database>
+ <target>indexdata.dk/marc</target>
+ </virtual>
+</filter>]]></screen>
+ <para>
+ As well as being useful in it own right, this filter also provides
+ the foundation for multi-database searching.
+ </para>
+ </section>
+
+
+ <section id="multidb.multi">
+ <title>Multi-database search with the <literal>multi</literal> filter</title>
+ <para>
+ To arrange for Metaproxy to broadcast searches to multiple back-end
+ servers, the configuration needs to include two components: a
+ <literal>virt_db</literal>
+ filter that specifies multiple
+ <literal><target></literal>
+ elements, and a subsequent
+ <literal>multi</literal>
+ filter. Here, for example, is a complete configuration that
+ broadcasts searches to both the Library of Congress catalogue and
+ Index Data's tiny testing database of MARC records:
+ </para>
+ <screen><![CDATA[<?xml version="1.0"?>
+<yp2 xmlns="http://indexdata.dk/yp2/config/1">
+ <start route="start"/>
+ <routes>
+ <route id="start">
+ <filter type="frontend_net">
+ <threads>10</threads>
+ <port>@:9000</port>
+ </filter>
+ <filter type="virt_db">
+ <virtual>
+ <database>lc</database>
+ <target>z3950.loc.gov:7090/voyager</target>
+ </virtual>
+ <virtual>
+ <database>marc</database>
+ <target>indexdata.dk/marc</target>
+ </virtual>
+ <virtual>
+ <database>all</database>
+ <target>z3950.loc.gov:7090/voyager</target>
+ <target>indexdata.dk/marc</target>
+ </virtual>
+ </filter>
+ <filter type="multi"/>
+ <filter type="z3950_client">
+ <timeout>30</timeout>
+ </filter>
+ <filter type="bounce"/>
+ </route>
+ </routes>
+</yp2>]]></screen>
+ <para>
+ (Using a
+ <literal>virt_db</literal>
+ filter that specifies multiple
+ <literal><target></literal>
+ elements but without a subsequent
+ <literal>multi</literal>
+ filter yields surprising and undesirable results, as will be
+ described below. Don't do that.)
+ </para>
+ <para>
+ Metaproxy can be invoked with this configuration as follows:
+ </para>
+ <screen>../src/metaproxy --config config-simple-multi.xml</screen>
+ <para>
+ And thereafter, Z39.50 clients can connect to the running server
+ (on port 9000, as specified in the configuration) and search in
+ any of the databases
+ <literal>lc</literal> (the Library of Congress catalogue),
+ <literal>marc</literal> (Index Data's test database of MARC records)
+ or
+ <literal>all</literal> (both of these). As an example, a session
+ using the YAZ command-line client <literal>yaz-client</literal> is
+ here included (edited for brevity and clarity):
+ </para>
+ <screen><![CDATA[$ yaz-client @:9000
+Connecting...OK.
+Z> base lc
+Z> find computer
+Search was a success.
+Number of hits: 10000, setno 1
+Elapsed: 5.521070
+Z> base marc
+Z> find computer
+Search was a success.
+Number of hits: 10, setno 3
+Elapsed: 0.060187
+Z> base all
+Z> find computer
+Search was a success.
+Number of hits: 10010, setno 4
+Elapsed: 2.237648
+Z> show 1
+[marc]Record type: USmarc
+001 11224466
+003 DLC
+005 00000000000000.0
+008 910710c19910701nju 00010 eng
+010 $a 11224466
+040 $a DLC $c DLC
+050 00 $a 123-xyz
+100 10 $a Jack Collins
+245 10 $a How to program a computer
+260 1 $a Penguin
+263 $a 8710
+300 $a p. cm.
+Elapsed: 0.119612
+Z> show 2
+[VOYAGER]Record type: USmarc
+001 13339105
+005 20041229102447.0
+008 030910s2004 caua 000 0 eng
+035 $a (DLC) 2003112666
+906 $a 7 $b cbc $c orignew $d 4 $e epcn $f 20 $g y-gencatlg
+925 0 $a acquire $b 1 shelf copy $x policy default
+955 $a pc10 2003-09-10 $a pv12 2004-06-23 to SSCD; $h sj05 2004-11-30 $e sj05 2004-11-30 to Shelf.
+010 $a 2003112666
+020 $a 0761542892
+040 $a DLC $c DLC $d DLC
+050 00 $a MLCM 2004/03312 (G)
+245 10 $a 007, everything or nothing : $b Prima's official strategy guide / $c created by Kaizen Media Group.
+246 3 $a Double-O-seven, everything or nothing
+246 30 $a Prima's official strategy guide
+260 $a Roseville, CA : $b Prima Games, $c c2004.
+300 $a 161 p. : $b col. ill. ; $c 28 cm.
+500 $a "Platforms: Nintendo GameCube, Macintosh, PC, PlayStation 2 computer entertainment system, Xbox"--P. [4] of cover.
+650 0 $a Video games.
+710 2 $a Kaizen Media Group.
+856 42 $3 Publisher description $u http://www.loc.gov/catdir/description/random052/2003112666.html
+Elapsed: 0.150623
+Z>
+]]></screen>
+ <para>
+ As can be seen, the first record in the result set is from the
+ Index Data test database, and the second from the Library of
+ Congress database. The result-set continues alternating records
+ round-robin style until the point where one of the databases'
+ records are exhausted.
+ </para>
+ <para>
+ This example uses only two back-end databases; more may be used.
+ There is no limitation imposed on the number of databases that may
+ be metasearched in this way: issues of resource usage and
+ administrative complexity dictate the practical limits.
+ </para>
+ <para>
+ What happens when one of the databases doesn't respond? By default,
+ the entire multi-database search fails, and the appropriate
+ diagnostic is returned to the client. This is usually appropriate
+ during development, when technicians need maximum information, but
+ can be inconvenient in deployment, when users typically don't want
+ to be bothered with problems of this kind and prefer just to get
+ the records from the databases that are available. To obtain this
+ latter behavior add an empty
+ <literal><hideunavailable></literal>
+ element inside the
+ <literal>multi</literal> filter:
+ </para>
+ <screen><![CDATA[ <filter type="multi">
+ <hideunavailable/>
+ </filter>]]></screen>
+ <para>
+ Under this regime, an error is reported to the client only if
+ <emphasis>all</emphasis> the databases in a multi-database search
+ are unavailable.
+ </para>
+ </section>
+
+
+ <section id="multidb.what">
+ <title>What's going on?</title>
+ <warning>
+ <title>Lark's vomit</title>
+ <para>
+ This section goes into a level of technical detail that is
+ probably not necessary in order to configure and use Metaproxy.
+ It is provided only for those who like to know how things work.
+ You should feel free to skip on to the next section if this one
+ doesn't seem like fun.
+ </para>
+ </warning>
+ <para>
+ Hold on tight - this may get a little hairy.
+ </para>
+ <para>
+ In the general course of things, a Z39.50 Init request may carry
+ with it an otherInfo packet of type <literal>VAL_PROXY</literal>,
+ whose value indicates the address of a Z39.50 server to which the
+ ultimate connection is to be made. (This otherInfo packet is
+ supported by YAZ-based Z39.50 clients and servers, but has not yet
+ been ratified by the Maintenance Agency and so is not widely used
+ in non-Index Data software. We're working on it.)
+ The <literal>VAL_PROXY</literal> packet functions
+ analogously to the absoluteURI-style Request-URI used with the GET
+ method when a web browser asks a proxy to forward its request: see
+ the
+ <ulink url="http://www.w3.org/Protocols/rfc2616/rfc2616-sec5.html#sec5.1.2"
+ >Request-URI</ulink>
+ section of
+ <ulink url="http://www.w3.org/Protocols/rfc2616/rfc2616.html"
+ >the HTTP 1.1 specification</ulink>.
+ </para>
+ <para>
+ Within Metaproxy, Search requests that are part of the same
+ session as an Init request that carries a
+ <literal>VAL_PROXY</literal> otherInfo are also annotated with the
+ same information. The role of the <literal>virt_db</literal>
+ filter is to rewrite this otherInfo packet dependent on the
+ virtual database that the client wants to search.
+ </para>
+ <para>
+ When Metaproxy receives a Z39.50 Init request from a client, it
+ doesn't immediately forward that request to the back-end server.
+ Why not? Because it doesn't know <emphasis>which</emphasis>
+ back-end server to forward it to until the client sends a Search
+ request that specifies the database that it wants to search in.
+ Instead, it just treasures the Init request up in its heart; and,
+ later, the first time the client does a search on one of the
+ specified virtual databases, a connection is forged to the
+ appropriate server and the Init request is forwarded to it. If,
+ later in the session, the same client searches in a different
+ virtual database, then a connection is forged to the server that
+ hosts it, and the same cached Init request is forwarded there,
+ too.
+ </para>
+ <para>
+ All of this clever Init-delaying is done by the
+ <literal>frontend_net</literal> filter. The
+ <literal>virt_db</literal> filter knows nothing about it; in
+ fact, because the Init request that is received from the client
+ doesn't get forwarded until a Search request is received, the
+ <literal>virt_db</literal> filter (and the
+ <literal>z3950_client</literal> filter behind it) doesn't even get
+ invoked at Init time. The <emphasis>only</emphasis> thing that a
+ <literal>virt_db</literal> filter ever does is rewrite the
+ <literal>VAL_PROXY</literal> otherInfo in the requests that pass
+ through it.
+ </para>
+ <para>
+ It is possible for a <literal>virt_db</literal> filter to contain
+ multiple
+ <literal><target></literal>
+ elements. What does this mean? Only that the filter will add
+ multiple <literal>VAL_PROXY</literal> otherInfo packets to the
+ Search requests that pass through it. That's because the virtual
+ DB filter is dumb, and does exactly what it's told - no more, no
+ less.
+ If a Search request with multiple <literal>VAL_PROXY</literal>
+ otherInfo packets reaches a <literal>z3950_client</literal>
+ filter, this is an error. That filter doesn't know how to deal
+ with multiple targets, so it will either just pick one and search
+ in it, or (better) fail with an error message.
+ </para>
+ <para>
+ The <literal>multi</literal> filter comes to the rescue! This is
+ the only filter that knows how to deal with multiple
+ <literal>VAL_PROXY</literal> otherInfo packets, and it does so by
+ making multiple copies of the entire Search request: one for each
+ <literal>VAL_PROXY</literal>. Each of these new copies is then
+ passed down through the remaining filters in the route. (The
+ copies are handled in parallel though the
+ spawning of new threads.) Since the copies each have only one
+ <literal>VAL_PROXY</literal> otherInfo, they can be handled by the
+ <literal>z3950_client</literal> filter, which happily deals with
+ each one individually. When the results of the individual
+ searches come back up to the <literal>multi</literal> filter, it
+ merges them into a single Search response, which is what
+ eventually makes it back to the client.
+ </para>
+ </section>
+
+
+ <section id="multidb.picture">
+ <title>A picture is worth a thousand words (but only five hundred on 64-bit architectures)</title>
+ <simpara>
+ <inlinemediaobject>
+ <imageobject>
+ <imagedata fileref="multi.pdf" format="PDF" scale="50"/>
+ </imageobject>
+ <imageobject>
+ <imagedata fileref="multi.png" format="PNG"/>
+ </imageobject>
+ <textobject>
+ <!-- Fall back if none of the images can be used -->
+ <phrase>
+ [Here there should be a diagram showing the progress of
+ packages through the filters during a simple virtual-database
+ search and a multi-database search, but is seems that your
+ tool chain has not been able to include the diagram in this
+ document.]
+ </phrase>
+ </textobject>
+<!-- ### This used to work with an older version of DocBook
+ <caption>
+ <para>Caption: progress of packages through filters.</para>
+ </caption>
+-->
+ </inlinemediaobject>
+ </simpara>
+ </section>