From: Marc Cromme Date: Wed, 21 Feb 2007 15:03:30 +0000 (+0000) Subject: more info on DOM filter config X-Git-Tag: ZEBRA.2.0.12~34 X-Git-Url: http://sru.miketaylor.org.uk/cgi-bin?a=commitdiff_plain;h=bd797d70f666280cdf941337d86b438a8d2506fc;p=idzebra-moved-to-github.git more info on DOM filter config --- diff --git a/doc/recordmodel-domxml.xml b/doc/recordmodel-domxml.xml index 009d0fd..2c89976 100644 --- a/doc/recordmodel-domxml.xml +++ b/doc/recordmodel-domxml.xml @@ -1,5 +1,5 @@ - + &dom; &xml; Record Model and Filter Module @@ -267,7 +267,53 @@
- Canonical Indexing Format + Canonical Indexing Format + + + &dom; &xml; indexing comes in two flavors: pure + processing-instruction governed plain &xml; documents, and - very + similar to the Alvis filter indexing format - &xml; documents + containing &xml; <record> and + <index> instructions from the magic + namespace xmlns:z="http://indexdata.dk/zebra-2.0". + + +
+ Processing-instruction governed indexing format + + The output of the processing instruction driven + indexing &xslt; stylesheets must contain + processing instructions named + zebra-2.0. + The output of the &xslt; indexing transformation is then + parsed using &dom; methods, and the contained instructions are + performed on the elements and their + subtrees directly following the processing instructions. + + + For example, the output of the command + + xsltproc dom-index-pi.xsl marc-one.xml + + might look like this: + + + + + + 11224466 + + How to program a computer + + ]]> + + +
+ +
+ Magic element governed indexing format + The output of the indexing &xslt; stylesheets must contain certain elements in the magic xmlns:z="http://indexdata.dk/zebra-2.0" @@ -278,30 +324,34 @@ For example, the output of the command - - xsltproc xsl/oai2index.xsl one-record.xml + + xsltproc dom-index-element.xsl marc-one.xml might look like this: - <?xml version="1.0" encoding="UTF-8"?> - <z:record xmlns:z="http://indexdata.dk/zebra/xslt/1" - z:id="oai:JTRS:CP-3290---Volume-I" - z:rank="47896" - z:type="update"> - <z:index name="oai_identifier" type="0"> - oai:JTRS:CP-3290---Volume-I</z:index> - <z:index name="oai_datestamp" type="0">2004-07-09</z:index> - <z:index name="oai_setspec" type="0">jtrs</z:index> - <z:index name="dc_all" type="w"> - <z:index name="dc_title" type="w">Proceedings of the 4th - International Conference and Exhibition: - World Congress on Superconductivity - Volume I</z:index> - <z:index name="dc_creator" type="w">Kumar Krishen and *Calvin - Burnham, Editors</z:index> - </z:index> - </z:record> + + + 11224466 + + How to program a computer + + ]]> +
+ + +
+ Semantics of the indexing formats + + + Both indexing formats are defined with equal semantics and + behaviour in mind. + + + This means the following: From the original &xml; file one-record.xml (or from the &xml; record &dom; of the same form coming from a splitted input file), the indexing @@ -321,24 +371,30 @@ insert, update, and delete. - In this example, the following literal indexes are constructed: + + + In these examples, the following literal indexes are constructed: - oai_identifier - oai_datestamp - oai_setspec - dc_all - dc_title - dc_creator + any:w + control:w + title:w + title:p + title:s - where the indexing type is defined in the - type attribute - (any value from the standard configuration - file default.idx will do). Finally, any + where the indexing type is defined after the + literal ':' charaacter. + Any value from the standard configuration + file default.idx will do. + Finally, any text() node content recursively contained - inside the index will be filtered through the + inside the <z:index> element, or any + element following a index processing instruction, + will be filtered through the appropriate charmap for character normalization, and will be - inserted in the index. + inserted in the named indexes. + + Specific to this example, we see that the single word oai:JTRS:CP-3290---Volume-I will be literal, @@ -398,6 +454,9 @@ filter configuration files involves in this process, and that the literal index names are used during search and retrieval. + +
+