doc/recordmodel-domxml.xml

   1 <chapter id="record-model-domxml">
   2   <!-- $Id: recordmodel-domxml.xml,v 1.9 2007-02-22 15:44:19 marc Exp $ -->
   3   <title>&dom; &xml; Record Model and Filter Module</title>
   4
   5   <para>
   6    The record model described in this chapter applies to the fundamental,
   7    structured &xml;
   8    record type <literal>&dom;</literal>, introduced in
   9    <xref linkend="componentmodulesdom"/>. The &dom; &xml; record model
  10    is experimental, and it's inner workings might change in future
  11    releases of the &zebra; Information Server.
  12   </para>
  13
  14
  15
  16   <section id="record-model-domxml-filter">
  17    <title>&dom; Record Filter Architecture</title>
  18
  19      <para>
  20       The &dom; &xml; filter uses a standard &dom; &xml; structure as
  21       internal data model, and can therefore parse, index, and display
  22       any &xml; document type. It is well suited to work on
  23       standardized &xml;-based formats such as Dublin Core, MODS, METS,
  24       MARCXML, OAI-PMH, RSS, and performs equally  well on any other
  25       non-standard &xml; format.
  26     </para>
  27     <para>
  28       A parser for binary &marc; records based on the ISO2709 library
  29       standard is provided, it transforms these to the internal
  30       &marcxml; &dom; representation. Other binary document parsers
  31       are planned to follow.
  32     </para>
  33
  34     <para>
  35       The &dom; filter architecture consists of four
  36       different pipelines, each being a chain of arbitrarily many successive
  37       &xslt; transformations of the internal &dom; &xml;
  38       representations of documents.
  39     </para>
  40
  41     <figure id="record-model-domxml-architecture-fig">
  42       <title>&dom; &xml; filter architecture</title>
  43       <mediaobject>
  44        <imageobject>
  45          <imagedata fileref="domfilter.pdf" format="PDF" scale="50"/>
  46         </imageobject>
  47         <imageobject>
  48           <imagedata fileref="domfilter.png" format="PNG"/>
  49         </imageobject>
  50         <textobject>
  51         <!-- Fall back if none of the images can be used -->
  52         <phrase>
  53           [Here there should be a diagram showing the &dom; &xml;
  54            filter architecture, but is seems that your
  55            tool chain has not been able to include the diagram in this
  56            document.]
  57          </phrase>
  58         </textobject>
  59       </mediaobject>
  60      </figure>
  61
  62
  63     <table id="record-model-domxml-architecture-table" frame="top">
  64       <title>&dom; &xml; filter pipelines overview</title>
  65       <tgroup cols="5">
  66        <thead>
  67         <row>
  68          <entry>Name</entry>
  69          <entry>When</entry>
  70          <entry>Description</entry>
  71          <entry>Input</entry>
  72          <entry>Output</entry>
  73         </row>
  74        </thead>
  75
  76        <tbody>
  77         <row>
  78          <entry><literal>input</literal></entry>
  79          <entry>first</entry>
  80          <entry>input parsing and initial
  81           transformations to common &xml; format</entry>
  82          <entry>Input raw &xml; record buffers, &xml;  streams and
  83                 binary &marc; buffers</entry>
  84          <entry>Common &xml; &dom;</entry>
  85         </row>
  86         <row>
  87          <entry><literal>extract</literal></entry>
  88          <entry>second</entry>
  89          <entry>indexing term extraction
  90           transformations</entry>
  91          <entry>Common &xml; &dom;</entry>
  92          <entry>Indexing &xml; &dom;</entry>
  93         </row>
  94         <row>
  95          <entry><literal>store</literal></entry>
  96          <entry>second</entry>
  97          <entry> transformations before internal document
  98           storage</entry>
  99          <entry>Common &xml; &dom;</entry>
 100          <entry>Storage &xml; &dom;</entry>
 101         </row>
 102         <row>
 103          <entry><literal>retrieve</literal></entry>
 104          <entry>third</entry>
 105          <entry>multiple document retrieve transformations from
 106           storage to different output
 107           formats are possible</entry>
 108          <entry>Storage &xml; &dom;</entry>
 109          <entry>Output &xml; syntax in requested formats</entry>
 110         </row>
 111        </tbody>
 112       </tgroup>
 113      </table>
 114
 115     <para>
 116       The &dom; &xml; filter pipelines use &xslt; (and if  supported on
 117       your platform, even &exslt;), it brings thus full &xpath;
 118       support to the indexing, storage and display rules of not only
 119       &xml; documents, but also binary &marc; records.
 120     </para>
 121    </section>
 122
 123
 124    <section id="record-model-domxml-pipeline">
 125     <title>&dom; &xml; filter pipeline configuration</title>
 126
 127    <para>
 128     The experimental, loadable  &dom; &xml;/&xslt; filter module
 129    <literal>mod-dom.so</literal>
 130     is invoked by the <filename>zebra.cfg</filename> configuration statement
 131     <screen>
 132      recordtype.xml: dom.db/filter_dom_conf.xml
 133     </screen>
 134     In this example the &dom; &xml; filter is configured to work
 135     on all data files with suffix
 136     <filename>*.xml</filename>, where the configuration file is found in the
 137     path <filename>db/filter_dom_conf.xml</filename>.
 138    </para>
 139
 140    <para>The &dom; &xslt; filter configuration file must be
 141     valid &xml;. It might look like this:
 142     <screen>
 143     <![CDATA[
 144     <?xml version="1.0" encoding="UTF8"?>
 145     <dom xmlns="http://indexdata.com/zebra-2.0">
 146       <input>
 147         <xmlreader level="1"/>
 148         <!-- <marc inputcharset="marc-8"/> -->
 149       </input>
 150       <extrac>
 151          <xslt stylesheet="common2index.xsl"/>
 152       </extract>
 153       <store>
 154          <xslt stylesheet="common2store.xsl"/>
 155       </store>
 156       <retrieve name="dc">
 157         <xslt stylesheet="store2dc.xsl"/>
 158       </retrieve>
 159       <retrieve name="mods">
 160         <xslt stylesheet="store2mods.xsl"/>
 161       </retrieve>
 162     </dom>
 163     ]]>
 164     </screen>
 165    </para>
 166    <para>
 167      The root &xml; element <literal>&lt;dom&gt;</literal> and all other &dom;
 168      &xml; filter elements are residing in the namespace
 169      <literal>xmlns="http://indexdata.dk/zebra-2.0"</literal>.
 170    </para>
 171    <para>
 172     All pipeline definition elements - i.e. the
 173      <literal>&lt;input&gt;</literal>,
 174      <literal>&lt;extract&gt;</literal>,
 175      <literal>&lt;store&gt;</literal>, and
 176      <literal>&lt;retrieve&gt;</literal> elements - are optional.
 177      Missing pipeline definitions are just interpreted
 178      do-nothing identity pipelines.
 179    </para>
 180    <para>
 181     All pipeline definition elements may contain zero or more
 182     <literal><![CDATA[<xslt stylesheet="path/file.xsl"/>]]></literal>
 183     &xslt; transformation instructions, which are performed
 184     sequentially from top to bottom.
 185     The paths in the <literal>stylesheet</literal> attributes
 186     are relative to zebras working directory, or absolute to the file
 187     system root.
 188    </para>
 189
 190
 191    <section id="record-model-domxml-pipeline-input">
 192     <title>Input pipeline</title>
 193    <para>
 194     The <literal>&lt;input&gt;</literal> pipeline definition element
 195     may contain either one &xml; Reader definition
 196     <literal><![CDATA[<xmlreader level="1"/>]]></literal>, used to split
 197     an &xml; collection input stream into individual &xml; &dom;
 198     documents at the prescribed element level,
 199     or one &marc; binary
 200     parsing instruction
 201     <literal><![CDATA[<marc inputcharset="marc-8"/>]]></literal>, which defines
 202     a conversion to &marcxml; format &dom; trees. The allowed values
 203     of the <literal>inputcharset</literal> attribute depend on your
 204     local <productname>iconv</productname> set-up.
 205    </para>
 206    <para>
 207     Both input parsers deliver individual &dom; &xml; documents to the
 208     following chain of zero or more
 209     <literal><![CDATA[<xslt stylesheet="path/file.xsl"/>]]></literal>
 210     &xslt; transformations. At the end of this pipeline, the documents
 211     are in the common format, used to feed both the
 212      <literal>&lt;extract&gt;</literal> and
 213      <literal>&lt;store&gt;</literal> pipelines.
 214    </para>
 215    </section>
 216
 217    <section id="record-model-domxml-pipeline-extract">
 218     <title>Extract pipeline</title>
 219      <para>
 220        The <literal>&lt;extract&gt;</literal> pipeline takes documents
 221        from any common &dom; &xml; format to the &zebra; specific
 222         indexing &dom; &xml; format.
 223        It may consist of zero ore more
 224        <literal><![CDATA[<xslt stylesheet="path/file.xsl"/>]]></literal>
 225        &xslt; transformations, and the outcome is handled to the
 226        &zebra; core to drive the process of building the inverted
 227        indexes. See
 228        <xref linkend="record-model-domxml-canonical-index"/> for
 229        details.
 230      </para>
 231    </section>
 232
 233    <section id="record-model-domxml-pipeline-store">
 234     <title>Store pipeline</title>
 235        The <literal>&lt;store&gt;</literal> pipeline takes documents
 236        from any common &dom;  &xml; format to the &zebra; specific
 237         storage &dom;  &xml; format.
 238        It may consist of zero ore more
 239        <literal><![CDATA[<xslt stylesheet="path/file.xsl"/>]]></literal>
 240        &xslt; transformations, and the outcome is handled to the
 241        &zebra; core for deposition into the internal storage system.
 242     </section>
 243
 244    <section id="record-model-domxml-pipeline-retrieve">
 245     <title>Retrieve pipeline</title>
 246     <para>
 247       Finally, there may be one or more
 248       <literal>&lt;retrieve&gt;</literal> pipeline definitions, each
 249       of them again consisting of zero or more
 250       <literal><![CDATA[<xslt stylesheet="path/file.xsl"/>]]></literal>
 251        &xslt; transformations. These are used for document
 252       presentation after search, and take the internal storage &dom;
 253       &xml; to the requested output formats during record present
 254       requests.
 255     </para>
 256     <para>
 257      The  possible multiple
 258      <literal>&lt;retrieve&gt;</literal> pipeline definitions
 259      are distinguished by their unique <literal>name</literal>
 260      attributes, these are the literal <literal>schema</literal> or
 261      <literal>element set</literal> names used in
 262       <ulink url="http://www.loc.gov/standards/sru/srw/">&srw;</ulink>,
 263       <ulink url="&url.sru;">&sru;</ulink> and
 264       &z3950; protocol queries.
 265    </para>
 266    </section>
 267
 268
 269    <section id="record-model-domxml-canonical-index">
 270     <title>Canonical Indexing Format</title>
 271
 272     <para>
 273      &dom; &xml; indexing comes in two flavors: pure
 274      processing-instruction governed plain &xml; documents, and - very
 275      similar to the Alvis filter indexing format - &xml; documents
 276      containing &xml; <literal>&lt;record&gt;</literal> and
 277      <literal>&lt;index&gt;</literal> instructions from the magic
 278      namespace <literal>xmlns:z="http://indexdata.dk/zebra-2.0"</literal>.
 279     </para>
 280
 281    <section id="record-model-domxml-canonical-index-pi">
 282     <title>Processing-instruction governed indexing format</title>
 283
 284       <para>The output of the processing instruction driven
 285       indexing &xslt; stylesheets must contain
 286       processing instructions named
 287        <literal>zebra-2.0</literal>.
 288       The output of the &xslt; indexing transformation is then
 289       parsed using &dom; methods, and the contained instructions are
 290       performed on the <emphasis>elements and their
 291       subtrees directly following the processing instructions</emphasis>.
 292       </para>
 293       <para>
 294      For example, the output of the command
 295      <screen>
 296        xsltproc dom-index-pi.xsl marc-one.xml
 297      </screen>
 298      might look like this:
 299      <screen>
 300       <![CDATA[
 301       <?xml version="1.0" encoding="UTF-8"?>
 302       <?zebra-2.0 record id=11224466 rank=42?>
 303       <record>
 304         <?zebra-2.0 index control:0?>
 305         <control>11224466</control>
 306         <?zebra-2.0 index any:w title:w title:p title:s?>
 307         <title>How to program a computer</title>
 308       </record>
 309       ]]>
 310      </screen>
 311     </para>
 312    </section>
 313
 314    <section id="record-model-domxml-canonical-index-element">
 315     <title>Magic element governed indexing format</title>
 316
 317     <para>The output of the indexing &xslt; stylesheets must contain
 318     certain elements in the magic
 319      <literal>xmlns:z="http://indexdata.dk/zebra-2.0"</literal>
 320     namespace. The output of the &xslt; indexing transformation is then
 321     parsed using &dom; methods, and the contained instructions are
 322     performed on the <emphasis>magic elements and their
 323     subtrees</emphasis>.
 324     </para>
 325     <para>
 326     For example, the output of the command
 327      <screen>
 328       xsltproc dom-index-element.xsl marc-one.xml
 329      </screen>
 330      might look like this:
 331      <screen>
 332       <![CDATA[
 333       <?xml version="1.0" encoding="UTF-8"?>
 334       <z:record xmlns:z="http://indexdata.com/zebra-2.0"
 335                 z:id="11224466" z:rank="42">
 336           <z:index name="control:0">11224466</z:index>
 337           <z:index name="any:w title:w title:p title:s">
 338                     How to program a computer</z:index>
 339       </z:record>
 340       ]]>
 341      </screen>
 342     </para>
 343    </section>
 344
 345
 346    <section id="record-model-domxml-canonical-index-semantics">
 347     <title>Semantics of the indexing formats</title>
 348
 349     <para>
 350      Both indexing formats are defined with equal semantics and
 351      behavior in mind:
 352      <itemizedlist>
 353        <listitem>
 354          <para>&zebra; specific instructions are either
 355          processing instructions named
 356          <literal>zebra-2.0</literal> or
 357          elements contained in the namespace
 358          <literal>xmlns:z="http://indexdata.dk/zebra-2.0"</literal>.
 359          </para>
 360        </listitem>
 361        <listitem>
 362          <para>There must be exactly one <literal>record</literal>
 363            instruction, which sets the scope for the following,
 364            possibly nested <literal>index</literal> instructions.
 365          </para>
 366        </listitem>
 367        <listitem>
 368          <para>The unique <literal>record</literal> instruction
 369             may have additional attributes <literal>id</literal> and
 370             <literal>rank</literal>, where the value of the opaque ID
 371             may be any string not containing the whitespace character
 372             <literal>' '</literal>, and the rank value must be a
 373             non-negative integer. See
 374             <xref linkend="administration-ranking"/>
 375          </para>
 376        </listitem>
 377        <listitem>
 378          <para> Multiple and possible nested <literal>index</literal>
 379          instructions must contain at least one
 380          <literal>indexname:indextype</literal>
 381          pair, and may contain multiple such pairs separated by the
 382          whitespace character  <literal>' '</literal>. In each index
 383          pair, the name and the type of the index is separated by a
 384          colon character <literal>':'</literal>.
 385          </para>
 386        </listitem>
 387        <listitem>
 388          <para>
 389          Any index name consisting of ASCII letters, and following the
 390          standard &zebra; rules will do, see
 391          <xref linkend="querymodel-pqf-apt-mapping-accesspoint"/>.
 392          </para>
 393        </listitem>
 394        <listitem>
 395          <para>
 396          Index types are restricted to the values defined in
 397          the standard configuration
 398          file <filename>default.idx</filename>, see
 399          <xref linkend="querymodel-bib1"/> and
 400          <xref linkend="fields-and-charsets"/> for details.
 401          </para>
 402        </listitem>
 403       </itemizedlist>
 404     </para>
 405
 406
 407     <para>The examples work as follows:
 408      From the original &xml; file
 409      <literal>marc-one.xml</literal> (or from the &xml; record &dom; of the
 410      same form coming from an <literal>&lt;input&gt;</literal>
 411      pipeline),
 412      the indexing
 413      pipeline <literal>&lt;extract&gt;</literal>
 414      produces an indexing &xml; record, which is defined by
 415      the <literal>record</literal> instruction
 416      &zebra; uses the content of
 417      <literal>z:id="11224466"</literal>
 418      or
 419      <literal>id=11224466</literal>
 420      as internal
 421      record ID, and - in case static ranking is set - the content of
 422      <literal>rank=42</literal>
 423      or
 424      <literal>z:rank="42"</literal>
 425      as static rank.
 426     </para>
 427
 428
 429     <para>In these examples, the following literal indexes are constructed:
 430      <screen>
 431        any:w
 432        control:0
 433        title:w
 434        title:p
 435        title:s
 436      </screen>
 437      where the indexing type is defined after the
 438      literal <literal>':'</literal> character.
 439      Any value from the standard configuration
 440      file <filename>default.idx</filename> will do.
 441      Finally, any
 442      <literal>text()</literal> node content recursively contained
 443      inside the <literal>&lt;z:index&gt;</literal> element, or any
 444      element following a <literal>index</literal> processing instruction,
 445      will be filtered through the
 446      appropriate char map for character normalization, and will be
 447      inserted in the named indexes.
 448     </para>
 449     <para>
 450      Finally, this example configuration can be queried using &pqf;
 451      queries, either transported by &z3950;, (here using a yaz-client)
 452      <screen>
 453       <![CDATA[
 454       Z> open localhost:9999
 455       Z> elem dc
 456       Z> form xml
 457       Z>
 458       Z> find @attr 1=control @attr 4=3 11224466
 459       Z> scan @attr 1=control @attr 4=3 ""
 460       Z>
 461       Z> find @attr 1=title program
 462       Z> scan @attr 1=title ""
 463       Z>
 464       Z> find @attr 1=title @attr 4=2 "How to program a computer"
 465       Z> scan @attr 1=title @attr 4=2 ""
 466       ]]>
 467      </screen>
 468      or the proprietary
 469      extensions <literal>x-pquery</literal> and
 470      <literal>x-pScanClause</literal> to
 471      &sru;, and &srw;
 472      <screen>
 473       <![CDATA[
 474       http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=@attr 1=title program
 475       http://localhost:9999/?version=1.1&operation=scan&x-pScanClause=@attr 1=title ""
 476       ]]>
 477      </screen>
 478      See <xref linkend="zebrasrv-sru"/> for more information on &sru;/&srw;
 479      configuration, and <xref linkend="gfs-config"/> or the &yaz;
 480      <ulink url="&url.yaz.cql;">&cql; section</ulink>
 481      for the details or the &yaz; frontend server.
 482     </para>
 483     <para>
 484      Notice that there are no <filename>*.abs</filename>,
 485      <filename>*.est</filename>, <filename>*.map</filename>, or other &grs1;
 486      filter configuration files involves in this process, and that the
 487      literal index names are used during search and retrieval.
 488     </para>
 489     <para>
 490      In case that we want to support the usual
 491      <literal>bib-1</literal> &z3950; numeric access points, it is a
 492      good idea to choose string index names defined in the default
 493      configuration file <filename>tab/bib1.att</filename>, see
 494      <xref linkend="attset-files"/>
 495     </para>
 496
 497    </section>
 498
 499    </section>
 500   </section>
 501
 502
 503   <section id="record-model-domxml-conf">
 504    <title>&dom; Record Model Configuration</title>
 505
 506
 507   <section id="record-model-domxml-index">
 508    <title>&dom; Indexing Configuration</title>
 509     <para>
 510      As mentioned above, there can be only one indexing pipeline,
 511      and configuration of the indexing process is a synonym
 512      of writing an &xslt; stylesheet which produces &xml; output containing the
 513      magic processing instructions or elements discussed in
 514      <xref linkend="record-model-domxml-canonical-index"/>.
 515      Obviously, there are million of different ways to accomplish this
 516      task, and some comments and code snippets are in order to
 517      enlighten the wary.
 518     </para>
 519     <para>
 520      Stylesheets can be written in the <emphasis>pull</emphasis> or
 521      the <emphasis>push</emphasis> style: <emphasis>pull</emphasis>
 522      means that the output &xml; structure is taken as starting point of
 523      the internal structure of the &xslt; stylesheet, and portions of
 524      the input &xml; are <emphasis>pulled</emphasis> out and inserted
 525      into the right spots of the output &xml; structure.
 526      On the other
 527      side, <emphasis>push</emphasis> &xslt; stylesheets are recursively
 528      calling their template definitions, a process which is commanded
 529      by the input &xml; structure, and is triggered to produce
 530      some output &xml;
 531      whenever some special conditions in the input stylesheets are
 532      met. The <emphasis>pull</emphasis> type is well-suited for input
 533      &xml; with strong and well-defined structure and semantics, like the
 534      following &oai; indexing example, whereas the
 535      <emphasis>push</emphasis> type might be the only possible way to
 536      sort out deeply recursive input &xml; formats.
 537     </para>
 538     <para>
 539      A <emphasis>pull</emphasis> stylesheet example used to index
 540      &oai; harvested records could use some of the following template
 541      definitions:
 542      <screen>
 543       <![CDATA[
 544       <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 545        xmlns:z="http://indexdata.dk/zebra-2.0"
 546        xmlns:oai="http://www.openarchives.org/&oai;/2.0/"
 547        xmlns:oai_dc="http://www.openarchives.org/&oai;/2.0/oai_dc/"
 548        xmlns:dc="http://purl.org/dc/elements/1.1/"
 549        version="1.0">
 550
 551        <!-- Example pull and magic element style Zebra indexing -->
 552        <xsl:output indent="yes" method="xml" version="1.0" encoding="UTF-8"/>
 553
 554         <!-- disable all default text node output -->
 555         <xsl:template match="text()"/>
 556
 557         <!-- disable all default recursive element node transversal -->
 558         <xsl:template match="node()"/>
 559
 560          <!-- match only on oai xml record root -->
 561          <xsl:template match="/">
 562           <z:record z:id="{normalize-space(oai:record/oai:header/oai:identifier)}">
 563            <!-- you may use z:rank="{some XSLT; function here}" -->
 564
 565            <!-- explicetly calling defined templates -->
 566            <xsl:apply-templates/>
 567           </z:record>
 568          </xsl:template>
 569
 570          <!-- OAI indexing templates -->
 571          <xsl:template match="oai:record/oai:header/oai:identifier">
 572           <z:index name="oai_identifier;0">
 573            <xsl:value-of select="."/>
 574           </z:index>
 575          </xsl:template>
 576
 577          <!-- etc, etc -->
 578
 579          <!-- DC specific indexing templates -->
 580          <xsl:template match="oai:record/oai:metadata/oai_dc:dc/dc:title">
 581           <z:index name="dc_any:w dc_title:w dc_title:p dc_title:s ">
 582            <xsl:value-of select="."/>
 583           </z:index>
 584          </xsl:template>
 585
 586          <!-- etc, etc -->
 587
 588       </xsl:stylesheet>
 589       ]]>
 590      </screen>
 591     </para>
 592     <para>
 593      Notice also,
 594      that the names and types of the indexes can be defined in the
 595      indexing &xslt; stylesheet <emphasis>dynamically according to
 596      content in the original &xml; records</emphasis>, which has
 597      opportunities for great power and wizardry as well as grande
 598      disaster.
 599     </para>
 600     <para>
 601      The following excerpt of a <emphasis>push</emphasis> stylesheet
 602      <emphasis>might</emphasis>
 603      be a good idea according to your strict control of the &xml;
 604      input format (due to rigorous checking against well-defined and
 605      tight RelaxNG or &xml; Schema's, for example):
 606      <screen>
 607       <![CDATA[
 608       <xsl:template name="element-name-indexes">
 609        <z:index name="{name()}:w">
 610         <xsl:value-of select="'1'"/>
 611        </z:index>
 612       </xsl:template>
 613       ]]>
 614      </screen>
 615      This template creates indexes which have the name of the working
 616      node of any input  &xml; file, and assigns a '1' to the index.
 617      The example query
 618      <literal>find @attr 1=xyz 1</literal>
 619      finds all files which contain at least one
 620      <literal>xyz</literal> &xml; element. In case you can not control
 621      which element names the input files contain, you might ask for
 622      disaster and bad karma using this technique.
 623     </para>
 624     <para>
 625      One variation over the theme <emphasis>dynamically created
 626      indexes</emphasis> will definitely be unwise:
 627      <screen>
 628       <![CDATA[
 629       <!-- match on oai xml record root -->
 630       <xsl:template match="/">
 631        <z:record>
 632
 633         <!-- create dynamic index name from input content -->
 634         <xsl:variable name="dynamic_content">
 635          <xsl:value-of select="oai:record/oai:header/oai:identifier"/>
 636         </xsl:variable>
 637
 638         <!-- create zillions of indexes with unknown names -->
 639         <z:index name="{$dynamic_content}:w">
 640          <xsl:value-of select="oai:record/oai:metadata/oai_dc:dc"/>
 641         </z:index>
 642        </z:record>
 643
 644       </xsl:template>
 645       ]]>
 646      </screen>
 647      Don't be tempted to play too smart tricks with the power of
 648      &xslt;, the above example will create zillions of
 649      indexes with unpredictable names, resulting in severe &zebra;
 650      index pollution..
 651     </para>
 652   </section>
 653
 654   <section id="record-model-domxml-elementset">
 655    <title>&dom; Exchange Formats</title>
 656    <para>
 657      An exchange format can be anything which can be the outcome of an
 658      &xslt; transformation, as far as the stylesheet is registered in
 659      the main &dom; &xslt; filter configuration file, see
 660      <xref linkend="record-model-domxml-filter"/>.
 661      In principle anything that can be expressed in  &xml;, HTML, and
 662      TEXT can be the output of a <literal>schema</literal> or
 663     <literal>element set</literal> directive during search, as long as
 664      the information comes from the
 665      <emphasis>original input record &xml; &dom; tree</emphasis>
 666      (and not the transformed and <emphasis>indexed</emphasis> &xml;!!).
 667     </para>
 668     <para>
 669      In addition, internal administrative information from the &zebra;
 670      indexer can be accessed during record retrieval. The following
 671      example is a summary of the possibilities:
 672      <screen>
 673       <![CDATA[
 674       <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 675        xmlns:z="http://indexdata.dk/zebra/xslt/1"
 676        version="1.0">
 677
 678        <!-- register internal zebra parameters -->
 679        <xsl:param name="id" select="''"/>
 680        <xsl:param name="filename" select="''"/>
 681        <xsl:param name="score" select="''"/>
 682        <xsl:param name="schema" select="''"/>
 683
 684        <xsl:output indent="yes" method="xml" version="1.0" encoding="UTF-8"/>
 685
 686        <!-- use then for display of internal information -->
 687        <xsl:template match="/">
 688          <z:zebra>
 689            <id><xsl:value-of select="$id"/></id>
 690            <filename><xsl:value-of select="$filename"/></filename>
 691            <score><xsl:value-of select="$score"/></score>
 692            <schema><xsl:value-of select="$schema"/></schema>
 693          </z:zebra>
 694        </xsl:template>
 695
 696       </xsl:stylesheet>
 697       ]]>
 698      </screen>
 699     </para>
 700
 701   </section>
 702
 703   <!--
 704   <section id="record-model-domxml-example">
 705    <title>&dom; Filter &oai; Indexing Example</title>
 706    <para>
 707      The source code tarball contains a working &dom; filter example in
 708      the directory <filename>examples/dom-oai/</filename>, which
 709      should get you started.
 710     </para>
 711     <para>
 712      More example data can be harvested from any &oai; compliant server,
 713      see details at the  &oai;
 714      <ulink url="http://www.openarchives.org/">
 715       http://www.openarchives.org/</ulink> web site, and the community
 716       links at
 717      <ulink url="http://www.openarchives.org/community/index.html">
 718       http://www.openarchives.org/community/index.html</ulink>.
 719      There is a  tutorial
 720      found at
 721      <ulink url="http://www.oaforum.org/tutorial/">
 722       http://www.oaforum.org/tutorial/</ulink>.
 723     </para>
 724    </section>
 725    -->
 726
 727   </section>
 728
 729
 730  </chapter>
 731
 732
 733
 734  <!-- Keep this comment at the end of the file
 735  Local variables:
 736  mode: sgml
 737  sgml-omittag:t
 738  sgml-shorttag:t
 739  sgml-minimize-attributes:nil
 740  sgml-always-quote-attributes:t
 741  sgml-indent-step:1
 742  sgml-indent-data:t
 743  sgml-parent-document: "zebra.xml"
 744  sgml-local-catalogs: nil
 745  sgml-namecase-general:t
 746  End:
 747  -->