From 14074ecf20d86518ccedc0c9617a49949ec19779 Mon Sep 17 00:00:00 2001 From: Adam Dickmeiss Date: Fri, 14 Jun 2013 11:02:06 +0200 Subject: [PATCH] Doc reformat; remove trailing white space --- doc/book.xml | 100 +++++++++++----------- doc/pazpar2_conf.xml | 209 ++++++++++++++++++++++++---------------------- doc/pazpar2_protocol.xml | 75 +++++++++-------- 3 files changed, 198 insertions(+), 186 deletions(-) diff --git a/doc/book.xml b/doc/book.xml index e223d7e..e71bc69 100644 --- a/doc/book.xml +++ b/doc/book.xml @@ -1,6 +1,6 @@ %local; @@ -59,10 +59,10 @@ - + Introduction - +
What Pazpar2 is @@ -78,8 +78,8 @@ other XML-structured response format -- XSLT is used to normalize and extract data from retrieval records for display and analysis. It can be used - against any server which supports the - Z39.50, SRU/SRW + against any server which supports the + Z39.50, SRU/SRW or SOLR protocol. Proprietary backend modules can function as connectors between these standard protocols and any non-standard API, including web-site scraping, to @@ -218,12 +218,12 @@ Greek, Russian, German and French. Pazpar2 uses the ICU Unicode character conversions, Unicode normalization, case folding and other fundamental operations needed in - tokenization, normalization and ranking of records. + tokenization, normalization and ranking of records. Compiling, linking, and usage of the ICU libraries is optional, but strongly recommended for usage in an international - environment. + environment. @@ -244,7 +244,7 @@ For example, if Libxml2/libXSLT libraries are already installed as development packages, use these. - + Ensure that the development libraries and header files are available on your system before compiling Pazpar2. For installation @@ -264,13 +264,13 @@ The make install will install manpages as well as the - Pazpar2 server, pazpar2, + Pazpar2 server, pazpar2, in PREFIX/sbin. By default, PREFIX is /usr/local/ . This can be changed with configure option .
- +
Installation from source on Windows @@ -305,7 +305,7 @@ The Windows version of Pazpar2 is a console application. It may - be installed as a Windows Service by adding option + be installed as a Windows Service by adding option -install for the pazpar2 program. This will register Pazpar2 as a service and use the other options provided in the same invocation. For example: @@ -322,13 +322,13 @@
- +
Installation of test interfaces In this section we show how to make available the set of simple interfaces that are part of the Pazpar2 source package, and which - demonstrate some ways to use Pazpar2. (Note that Debian users can + demonstrate some ways to use Pazpar2. (Note that Debian users can save time by just installing the package pazpar2-test1.) @@ -349,7 +349,7 @@ copy pazpar2.cfg.dist pazpar2.cfg ..\bin\pazpar2 -f pazpar2.cfg - This will start a Pazpar2 listener on port 9004. It will proxy + This will start a Pazpar2 listener on port 9004. It will proxy HTTP requests to port 80 on localhost, which we assume will be the regular HTTP server on the system. Inspect and modify pazpar2.cfg as needed if this is to be changed. The pazpar2.cfg file includes settings from the @@ -360,7 +360,7 @@ The test UIs are located in www. Ensure that this directory is available to the web server by copying - www to the document root, + www to the document root, using Apache's Alias directive, or creating a symbolic link: for example, on a Debian or Ubuntu system with Apache2 installed from the standard package, you might @@ -370,7 +370,7 @@ sudo ln -s `pwd`/www /var/www/pazpar2-demo - + This makes the test applications visible at @@ -387,7 +387,7 @@ accessed: test1, test2 and jsdemo are pure HTML+JavaScript setups, needing no server-side - intelligence; + intelligence; demo requires PHP on the server. @@ -398,7 +398,7 @@ In order to use Apache as frontend for the interface on port 80 - for public access etc., refer to + for public access etc., refer to .
@@ -415,11 +415,11 @@ . - +
Apache 2 Proxy - Apache 2 has a + Apache 2 has a proxy module @@ -428,7 +428,7 @@ based web service. The Apache 2 proxy must operate in the Reverse Proxy mode. - + On a Debian based Apache 2 system, the relevant modules can be enabled with: @@ -436,11 +436,11 @@ sudo a2enmod proxy_http proxy_balancer - + - Traditionally Pazpar2 interprets URL paths with suffix + Traditionally Pazpar2 interprets URL paths with suffix /search.pz2. - The + The ProxyPass @@ -468,13 +468,13 @@ ProxyRequests Off - + AddDefaultCharset off Order deny,allow Allow from all - + ProxyPass /myportal/search.pz2 http://localhost:8004/search.pz2 ProxyVia Off @@ -482,16 +482,16 @@
- +
- + Using Pazpar2 This chapter provides a general introduction to the use and - deployment of Pazpar2. + deployment of Pazpar2. - +
Pazpar2 and your systems architecture @@ -522,7 +522,7 @@ with the server from which the enclosing HTML page or object originated, Pazpar2 is designed so that it can act as a transparent proxy in front of an existing webserver (see for details). + linkend="pazpar2_conf"/> for details). In this mode, all regular HTTP requests are transparently passed through to your webserver, while Pazpar2 only intercepts search-related webservice requests. @@ -597,11 +597,11 @@ ]]> - + As you can see, there isn't much to it. There are really only a few important elements to this file. - + Elements should belong to the namespace http://www.indexdata.com/pazpar2/1.0. @@ -644,7 +644,7 @@ The webservice API of Pazpar2 is described in detail in . - + In brief, you use the 'init' command to create a session, a temporary workspace which carries information about the current @@ -678,7 +678,7 @@ In addition, the ICU tokenization and normalization rules must - be defined in the master configuration file described in + be defined in the master configuration file described in .
@@ -698,7 +698,7 @@ module in your Apache2 installation. - + On a Debian based Apache 2 system, the relevant modules can be enabled with: @@ -729,7 +729,7 @@ could use the following Apache 2 configuration to expose a single pazpar2 'endpoint' on a standard (/pazpar2/search.pz2) location: - + AddDefaultCharset off @@ -746,12 +746,12 @@ BalancerMember http://localhost:8007 route=pz4 - # route is resent in the 'session' param which has the form: + # route is resent in the 'session' param which has the form: # 'sessid.serverid', understandable by the mod_proxy_load_balancer # this is not going to work if the client tampers with the 'session' param ProxyPass /pazpar2/search.pz2 balancer://pz2cluster lbmethod=byrequests stickysession=session nofailover=On ]]> - + The 'ProxyPass' line sets up a reverse proxy for request ‘/pazpar2/search.pz2’ and delegates all requests to the load balancer (virtual worker) with name ‘pz2cluster’. @@ -759,11 +759,11 @@ The ‘Proxy’ section lists all the servers (real workers) which the load balancer can use. - + - + - +
Relevance ranking @@ -792,7 +792,7 @@ fetched form the database. In this case, the rank weigth w, the and rank tweaks lead, follow and length. - + License - + Pazpar2, Copyright © ©right-year; Index Data. - + Pazpar2 is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2, or (at your option) any later version. - + Pazpar2 is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. - + You should have received a copy of the GNU General Public License along with Pazpar2; see the file LICENSE. If not, write to the - Free Software Foundation, + Free Software Foundation, 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA - + &gpl2; - + name @@ -203,19 +203,19 @@ This is the name of the data element. It is matched against the 'type' attribute of the - 'metadata' element + 'metadata' element in the normalized record. A warning is produced if metadata elements with an unknown name are - found in the + found in the normalized record. This name is also used to - represent + represent data elements in the records returned by the webservice API, and to name sort lists and browse facets. - + type @@ -229,7 +229,7 @@ - + brief @@ -241,7 +241,7 @@ - + sortkey @@ -254,16 +254,16 @@ - + rank Specifies that this element is to be used to - help rank + help rank records against the user's query (when ranking is - requested). - The valus is of the form + requested). + The valus is of the form M [F N] @@ -283,7 +283,7 @@ For Pazpar2 1.6.13 and later, the rank may also defined - "per-document", by the normalization stylesheet. + "per-document", by the normalization stylesheet. The per field rank was introduced in Pazpar2 1.6.15. Earlier @@ -293,7 +293,7 @@ about ranking. - + termlist @@ -302,13 +302,13 @@ termlist, or browse facet. Values are tabulated from incoming records, and a highscore of values (with their associated frequency) is made available to the - client through the webservice API. + client through the webservice API. The possible values are 'yes' and 'no' (default). - + merge @@ -329,7 +329,7 @@ - + mergekey @@ -364,7 +364,7 @@ - + limitcluster @@ -380,29 +380,31 @@ - + limitmap - Specifies a default limitmap for this field. This is to avoid mass - configuring of targets. However it is important to review/do this on a per - target since it is usually target-specific. See limitmap for format. + Specifies a default limitmap for this field. This is to avoid mass + configuring of targets. However it is important to review/do + this on a per target since it is usually target-specific. + See limitmap for format. - + facetmap - Specifies a default facetmap for this field. This is to avoid mass - configuring of targets. However it is important to review/do this on a per - target since it is usually target-specific. See facetmap for format. + Specifies a default facetmap for this field. This is to avoid mass + configuring of targets. However it is important to review/do + this on a per target since it is usually target-specific. + See facetmap for format. - + setting @@ -412,7 +414,7 @@ are allowed. 'no' is the default and doesn't do anything. 'postproc' copies the value of a setting with the same name into the output of the normalization stylesheet(s). 'parameter' - makes the value of a setting with the same name available + makes the value of a setting with the same name available as a parameter to the normalization stylesheet, so you can further process the value inside of the stylesheet, or use the value to decide how to deal with other data values. @@ -427,9 +429,9 @@ - + - + @@ -456,7 +458,7 @@ rule set. Pazpar2 uses the particular rule sets for particular purposes. Rule set 'relevance' is used to normalize - terms for relevance ranking. Rule set 'sort' is used to + terms for relevance ranking. Rule set 'sort' is used to normalize terms for sorting. Rule set 'mergekey' is used to normalize terms for making a mergekey and, finally. Rule set 'facet' is normally used to normalize facet terms, unless @@ -470,7 +472,7 @@ in any order, except the 'index' element which logically belongs to the end of the list. The stated tokenization, transformation and charmapping instructions are performed - in order from top to bottom. + in order from top to bottom. @@ -479,7 +481,7 @@ The attribute 'rule' defines the direction of the per-character casemapping, allowed values are "l" - (lower), "u" (upper), "t" (title). + (lower), "u" (upper), "t" (title). @@ -490,10 +492,10 @@ Normalization and transformation of tokens follows the rules defined in the 'rule' attribute. For possible values we refer to the extensive ICU - documentation found at the + documentation found at the ICU transformation home page. Set filtering - principles are explained at the + principles are explained at the ICU set and filtering page. @@ -508,7 +510,7 @@ 'rule' attribute may have the following values: "s" (sentence), "l" (line-break), "w" (word), and "c" (character), the later probably not being - very useful in a pruning Pazpar2 installation. + very useful in a pruning Pazpar2 installation. @@ -520,7 +522,7 @@ - + relevance @@ -536,7 +538,7 @@ - + sort @@ -552,13 +554,13 @@ - + mergekey Specifies ICU tokenization and transformation rules - for tokens that are used in Pazpar2's mergekey. + for tokens that are used in Pazpar2's mergekey. The child element of 'mergekey' must be 'icu_chain' and the 'id' attribute of the icu_chain is ignored. This definition is obsolete and should be replaced by the equivalent @@ -596,7 +598,7 @@ The name and value of the CCL directive is gigen by attributes 'name' and 'value' respectively. Refer to possible list of names in the - YAZ manual . @@ -686,16 +688,16 @@ - + sort-default Specifies the default sort criteria (default 'relevance'), - which previous was hard-coded as default criteria in search. - This is a fix/work-around to avoid re-searching when using - target-based sorting. In order for this to work efficient, - the search must also have the sort critera parameter; otherwise + which previous was hard-coded as default criteria in search. + This is a fix/work-around to avoid re-searching when using + target-based sorting. In order for this to work efficient, + the search must also have the sort critera parameter; otherwise pazpar2 will do re-searching on search criteria changes, if changed between search and show command. @@ -705,7 +707,7 @@ - +--> settings @@ -734,7 +736,7 @@ Specifies timeout parameters for this service. The timeout - element supports the following attributes: + element supports the following attributes: session, z3950_operation, z3950_session which specifies 'session timeout', 'Z39.50 operation timeout', @@ -794,7 +796,7 @@ ]]> - + INCLUDE FACILITY @@ -819,8 +821,8 @@ kinds of attributes, or settings with search targets. This can be done through XML files which are read at startup; each file can associate one or more settings with one or more targets. The file format is generic - in nature, designed to support a wide range of application requirements. The - settings can be purely technical things, like, how to perform a title + in nature, designed to support a wide range of application requirements. + The settings can be purely technical things, like, how to perform a title search against a given target, or it can associate arbitrary name=value pairs with groups of targets -- for instance, if you would like to place all commercial full-text bases in one group for selection @@ -829,13 +831,13 @@ to drive sorting, facet/termlist generation, or end-user interface display logic. - + During startup, Pazpar2 will recursively read a specified directory (can be identified in the pazpar2.cfg file or on the command line), and process any settings files found therein. - + Clients of the Pazpar2 webservice interface can selectively override settings for individual targets within the scope of one session. This @@ -849,16 +851,17 @@ some search targets in different ways. This, again, can be managed using an external database or other lookup mechanism. Setting overrides can be performed either using the - init or the + init or the settings webservice command. - + In fact, every setting that applies to a database (except pz:id, which can only be used for filtering targets to use for a search) can be overridden - on a per-session basis. This allows the client to override specific CCL fields - for searching, etc., to meet the needs of a session or user. + on a per-session basis. + This allows the client to override specific CCL fields for + searching, etc., to meet the needs of a session or user. @@ -936,7 +939,7 @@ target, name, and value. - + target @@ -1040,7 +1043,7 @@ - + @@ -1097,7 +1100,7 @@ The following setting names are reserved by Pazpar2 to control the behavior of the client function. - + pz:cclmap:xxx @@ -1160,7 +1163,7 @@ The value iso2709 makes Pazpar2 convert retrieved MARC records to MARCXML. In order to convert to XML, the exact chacater set of the MARC must be known (if not, the resulting - XML is probably not well-formed). The character set may be + XML is probably not well-formed). The character set may be specified by adding: ;charset=charset to iso2709. If omitted, a charset of @@ -1227,7 +1230,7 @@ '.xsl'. - When mapping MARC records, XSLT can be bypassed for increased + When mapping MARC records, XSLT can be bypassed for increased performance with the alternate "MARC map" format. Provide the path of a file with extension ".mmap" containing on each line: @@ -1238,7 +1241,7 @@ 500 $ description 773 * citation - To map the field value specify a subfield of '$'. To store a + To map the field value specify a subfield of '$'. To store a concatenation of all subfields, specify a subfield of '*'. @@ -1297,7 +1300,7 @@ pz:presentchunk - Controls the chunk size in present requests. Pazpar2 will + Controls the chunk size in present requests. Pazpar2 will make (maxrecs / chunk) request(s). The default is 20. @@ -1317,13 +1320,13 @@ pz:zproxy - The 'pz:zproxy' setting has the value syntax + The 'pz:zproxy' setting has the value syntax 'host.internet.adress:port', it is used to tunnel Z39.50 requests through the named Z39.50 proxy. - + pz:apdulog @@ -1333,7 +1336,7 @@ - + pz:sru @@ -1353,18 +1356,19 @@ - + pz:sru_version This allows SRU version to be specified. If unset Pazpar2 will the default of YAZ (currently 1.2). Should be set - to 1.1 or 1.2. For Solr, the current supported/tested version is 1.4 and 3.x. + to 1.1 or 1.2. For Solr, the current supported/tested version + is 1.4 and 3.x. - + pz:pqf_prefix @@ -1378,7 +1382,7 @@ - + pz:pqf_strftime @@ -1406,7 +1410,7 @@ - + pz:sort @@ -1432,7 +1436,7 @@ - + pz:preferred @@ -1441,7 +1445,7 @@ target. Using block=pref on show command will wait for all these targets to return records before releasing the block. If no target is preferred, the block=pref will identical to block=1, - which release when one target has returned records. + which release when one target has returned records. @@ -1450,7 +1454,7 @@ (Not yet implemented). - Specifies the time for which a block should be released anyway. + Specifies the time for which a block should be released anyway. @@ -1458,7 +1462,7 @@ pz:termlist_term_count - Specifies number of facet terms to be requested from the target. + Specifies number of facet terms to be requested from the target. The default is unspecified e.g. server-decided. Also see pz:facetmap. @@ -1467,13 +1471,16 @@ pz:termlist_term_factor - Specifies whether to use a factor for pazpar2 generated facets (1) or not (0). - When mixing locallly generated (by the downloaded (pz:maxrecs) samples) - facet with native (target-generated) facets, the later will dominated the dominate the facet list - since they are generated based on the complete result set. - By scaling up the facet count using the ratio between total hit count and the sample size, - the total facet count can be approximated and thus better compared with native facets. - This is not enabled by default. + Specifies whether to use a factor for pazpar2 generated facets (1) + or not (0). + When mixing locally generated (by the downloaded (pz:maxrecs) samples) + facet with native (target-generated) facets, the later will + dominated the dominate the facet list since they are generated + based on the complete result set. + By scaling up the facet count using the ratio between total hit + count and the sample size, + the total facet count can be approximated and thus better compared + with native facets. This is not enabled by default. @@ -1501,7 +1508,7 @@ Specifies attributes for limiting a search to a field - using the limit parameter for search. It can be used to filter locally - or remotely (search in a target). In some cases the mapping of + or remotely (search in a target). In some cases the mapping of a field to a value is identical to an existing cclmap field; in other cases the field must be specified in a different way - for example to match a complete field (rather than parts of a subfield). @@ -1509,10 +1516,10 @@ The value of limitmap may have one of three forms: referral to an existing CCL field, a raw PQF string or a local limit. Leading string - determines type; either ccl: for CCL field, + determines type; either ccl: for CCL field, rpn: for PQF/RPN, or local: for filtering in Pazpar2. The local filtering may be followed - by a field a metadata field (default is to use the name of the + by a field a metadata field (default is to use the name of the limitmap itself). @@ -1565,9 +1572,9 @@ - + - + diff --git a/doc/pazpar2_protocol.xml b/doc/pazpar2_protocol.xml index b7e8245..2d6f050 100644 --- a/doc/pazpar2_protocol.xml +++ b/doc/pazpar2_protocol.xml @@ -107,7 +107,7 @@ - + ping @@ -145,7 +145,7 @@ or possibly a wildcard, and value is the desired value for the setting. - + Because the settings command manipulates potentially sensitive information, it is possible to configure Pazpar2 to only allow access @@ -153,7 +153,7 @@ scripting, which in turn is responsible for authenticating the user, and possibly determining which resources he has access to, etc. - + As a shortcut, it is also possible to override settings directly in @@ -181,7 +181,7 @@ search.pz?command=settings&session=2044502273&pz:allow[search.com:210/db1]=1 ]]> - + search @@ -277,13 +277,15 @@ search.pz?command=settings&session=2044502273&pz:allow[search.com:210/db1]=1 'position'. - If not specified here or as sort-default" - in pazpar2.cfg, Pazpar2 will default to the built-in 'relevance' ranking. + If not specified here or as + sort-default" + in pazpar2.cfg, Pazpar2 will default to the built-in + 'relevance' ranking. - Having sort criteria at search is important for targets that - supports native sorting in order to get best results. Pazpar2 - will trigger a new search if search criteria changes from Pazpar2 + Having sort criteria at search is important for targets that + supports native sorting in order to get best results. Pazpar2 + will trigger a new search if search criteria changes from Pazpar2 to target-based sorting or visa-versa. @@ -346,7 +348,7 @@ search.pz2?session=2044502273&command=search&query=computer+science ]]> - + stat @@ -388,7 +390,7 @@ search.pz2?session=2044502273&command=stat ]]> - + show @@ -402,14 +404,14 @@ search.pz2?session=2044502273&command=stat - + start First record to show - 0-indexed. - + num @@ -442,15 +444,18 @@ search.pz2?session=2044502273&command=stat Sort field names can be any field name designated as a sort field in the pazpar2.cfg file, or the special names 'relevance' and 'position'. - - If not specified here or as sort-default" - in pazpar2.cfg, pazpar2 will default to the built-in 'relevance' ranking. - - Having sort criteria at search is important for targets that - supports native sorting in order to get best results. pazpar2 - will trigger a new search if search criteria changes from pazpar2 + + + If not specified here or as + sort-default" + in pazpar2.cfg, pazpar2 will default to the built-in + 'relevance' ranking. + + + Having sort criteria at search is important for targets that + supports native sorting in order to get best results. pazpar2 + will trigger a new search if search criteria changes from pazpar2 to target-based sorting. - For targets where If pz:sortmap @@ -532,7 +537,7 @@ search.pz2?session=2044502273&command=show&start=0&num=2&sort=title:1 2 -- Number of records retrieved How to program a computer, by Jack Collins - 2 -- Number of merged records + 2 -- Number of merged records 6 -- Record ID for this record @@ -550,10 +555,10 @@ search.pz2?session=2044502273&command=show&start=0&num=2&sort=title:1 record - Retrieves a detailed record. Unlike the - show command, this command + Retrieves a detailed record. Unlike the + show command, this command returns metadata records before merging takes place. Parameters: - + session @@ -633,7 +638,7 @@ search.pz2?session=2044502273&command=show&start=0&num=2&sort=title:1 syntax - This optional parameter is the record syntax used for raw + This optional parameter is the record syntax used for raw transfer (i.e. when offset is specified). If syntax is not given, but offset is used, the value of pz:requestsyntax is used. @@ -680,14 +685,14 @@ search.pz2?session=2044502273&command=show&start=0&num=2&sort=title:1 - + Example: Example output: - + @@ -705,7 +710,7 @@ search.pz2?session=605047297&command=record&id=3 termlist Retrieves term list(s). Parameters: - + session @@ -762,7 +767,7 @@ search.pz2?session=2044502273&command=termlist&name=author,subject ]]> - + For the special termlist name "xtargets", results are returned about the targets which have returned the most hits. @@ -783,7 +788,7 @@ search.pz2?session=2044502273&command=termlist&name=author,subject ]]> - + bytarget @@ -801,14 +806,14 @@ search.pz2?session=2044502273&command=termlist&name=author,subject - + Example: - + Example output: - + OK @@ -822,7 +827,7 @@ search.pz2?session=605047297&command=bytarget&id=3 ]]> - + The following client states are defined: Client_Connecting, Client_Connected, Client_Idle, Client_Initializing, Client_Searching, Client_Searching, Client_Presenting, Client_Error, Client_Failed, -- 1.7.10.4