From: Galen Charlton Date: Thu, 30 Jul 2009 12:55:18 +0000 (-0400) Subject: fix typos and other minor errors in doc X-Git-Tag: v2.0.41~3 X-Git-Url: http://sru.miketaylor.org.uk/cgi-bin?a=commitdiff_plain;h=3ec73ef21b81b6d2abd9f21c31bf4ae4560df7ca;p=idzebra-moved-to-github.git fix typos and other minor errors in doc Signed-off-by: Galen Charlton --- diff --git a/doc/administration.xml b/doc/administration.xml index 7e0f6ac..471cfef 100644 --- a/doc/administration.xml +++ b/doc/administration.xml @@ -341,10 +341,10 @@ estimatehits:: integer - Controls whether &zebra; should calculate approximite hit counts and + Controls whether &zebra; should calculate approximate hit counts and at which hit count it is to be enabled. - A value of 0 disables approximiate hit counts. - For a positive value approximaite hit count is enabled + A value of 0 disables approximate hit counts. + For a positive value approximate hit count is enabled if it is known to be larger than integer. @@ -438,7 +438,7 @@ permstring - Specifies permissions (priviledge) for a user that are allowed + Specifies permissions (privilege) for a user that are allowed to access &zebra; via the passwd system. There are two kinds of permissions currently: read (r) and write(w). By default users not listed in a permission directive are given the read @@ -458,7 +458,7 @@ Names a file which lists database subscriptions for individual users. The access file should consists of lines of the form username: - dbnames, where dbnames is a list of database names, seprated by + dbnames, where dbnames is a list of database names, separated by '+'. No whitespace is allowed in the database list. @@ -1042,7 +1042,7 @@ Static Ranking - &zebra; uses internally inverted indexes to look up term occurencies + &zebra; uses internally inverted indexes to look up term frequencies in documents. Multiple queries from different indexes can be combined by the binary boolean operations AND, OR and/or NOT (which @@ -1133,7 +1133,7 @@ The default rank-1 ranking module implements a TF/IDF (Term Frequecy over Inverse Document Frequency) like - algorithm. In contrast to the usual defintion of TF/IDF + algorithm. In contrast to the usual definition of TF/IDF algorithms, which only considers searching in one full-text index, this one works on multiple indexes at the same time. More precisely, @@ -1846,7 +1846,7 @@ where g = rset_count(terms[i]->rset) is the count of all documents in this speci Extended services debugging guide - When debugging ES over PHP we recomment the following order of tests: + When debugging ES over PHP we recommend the following order of tests: @@ -1867,14 +1867,14 @@ where g = rset_count(terms[i]->rset) is the count of all documents in this speci yaz-client like described in , and - remeber the -a option which tells you what + remember the -a option which tells you what goes over the wire! Notice also the section on permissions: try perm.anonymous: rw in zebra.cfg to make sure you do not run into - permission problems (but never expose such an unsecure setup on the + permission problems (but never expose such an insecure setup on the internet!!!). Then, make sure to set the general recordType instruction, pointing correctly to the GRS-1 filters, diff --git a/doc/architecture.xml b/doc/architecture.xml index c1088d6..5b3a27e 100644 --- a/doc/architecture.xml +++ b/doc/architecture.xml @@ -220,7 +220,7 @@ The internal &acro.dom; &acro.xml; representation can be fed into four - different pipelines, consisting of arbitraily many sucessive + different pipelines, consisting of arbitrarily many successive &acro.xslt; transformations; these are for input parsing and initial @@ -291,8 +291,8 @@ static ranks. This imposes no overhead at all, both search and indexing perform still O(1) irrespectively of document - collection size. This feature resembles Googles pre-ranking using - their Pagerank algorithm. + collection size. This feature resembles Google's pre-ranking using + their PageRank algorithm. Details on the experimental Alvis &acro.xslt; filter are found in @@ -442,7 +442,7 @@ &zebra;'s internal index structure/data for a record. In particular, the regular record filters are not invoked when these are in use. - This can in some cases make the retrival faster than regular + This can in some cases make the retrieval faster than regular retrieval operations (for &acro.marc;, &acro.xml; etc). @@ -564,7 +564,7 @@ Z> elements zebra::meta Z> s 1+1 - displays all available metadata on the record. These include sytem + displays all available metadata on the record. These include system number, database name, indexed filename, filter used for indexing, score and static ranking information and finally bytesize of record. diff --git a/doc/examples.xml b/doc/examples.xml index 86da3b9..ebbac17 100644 --- a/doc/examples.xml +++ b/doc/examples.xml @@ -28,7 +28,7 @@ - What record schemas to support. (Subsidiary files specifiy how + What record schemas to support. (Subsidiary files specify how to index the contents of records in those schemas, and what format to use when presenting records in those schemas to client software.) @@ -264,7 +264,7 @@ xelm /Zthes/termModifiedBy termModifiedBy:w - Declare Thesausus attribute set. See zthes.att. + Declare Thesaurus attribute set. See zthes.att. @@ -283,7 +283,7 @@ xelm /Zthes/termModifiedBy termModifiedBy:w Make termName word searchable by both - Zthes attribute termName (1002) and &acro.bib1; atttribute title (4). + Zthes attribute termName (1002) and &acro.bib1; attribute title (4). diff --git a/doc/field-structure.xml b/doc/field-structure.xml index b9fe2ed..98c4183 100644 --- a/doc/field-structure.xml +++ b/doc/field-structure.xml @@ -84,9 +84,9 @@ (non-space characters) separated by single space characters (normalized to " " on display). When completeness is disabled, each word is indexed as a separate entry. Complete subfield - indexing is most useful for fields which are typically browsed (eg. + indexing is most useful for fields which are typically browsed (e.g., titles, authors, or subjects), or instances where a match on a - complete subfield is essential (eg. exact title searching). For fields + complete subfield is essential (e.g., exact title searching). For fields where completeness is disabled, the search engine will interpret a search containing space characters as a word proximity search. @@ -146,7 +146,7 @@ to them: # Traditional word index - # Used if completenss is 'incomplete field' (@attr 6=1) and + # Used if completeness is 'incomplete field' (@attr 6=1) and # structure is word/phrase/word-list/free-form-text/document-text index w completeness 0 @@ -295,7 +295,7 @@ Curly braces {} may be used to enclose ranges of single characters (possibly using the escape convention described in the - preceding point), eg. {a-z} to introduce the + preceding point), e.g., {a-z} to introduce the standard range of ASCII characters. Note that the interpretation of such a range depends on the concrete representation in your local, physical character set. @@ -304,8 +304,8 @@ - paranthesises () may be used to enclose multi-byte characters - - eg. diacritics or special national combinations (eg. Spanish + parentheses () may be used to enclose multi-byte characters - + e.g., diacritics or special national combinations (e.g., Spanish "ll"). When found in the input stream (or a search term), these characters are viewed and sorted as a single character, with a sorting value depending on the position of the group in the value @@ -515,7 +515,7 @@ MARCXML indexing using ICU The directory examples/marcxml includes - a complete sample with MARCXML recordst that are DOM XML indexed + a complete sample with MARCXML records that are DOM XML indexed using ICU chain rules. Study the README in the marcxml directory for details. diff --git a/doc/installation.xml b/doc/installation.xml index 8a19a06..d0670d1 100644 --- a/doc/installation.xml +++ b/doc/installation.xml @@ -9,7 +9,7 @@ The software is regularly tested on Debian GNU/Linux, - Redhat Linux, + Red Hat Linux, Gentoo Linux, SuSE Linux, FreeBSD (i386), @@ -458,7 +458,7 @@ - The attribute set defintion files may no longer contain + The attribute set definition files may no longer contain redirection to other fields. For example the following snippet of a custom custom/bib1.att diff --git a/doc/introduction.xml b/doc/introduction.xml index 6a66f77..528f546 100644 --- a/doc/introduction.xml +++ b/doc/introduction.xml @@ -43,7 +43,7 @@ &zebra; is a high-performance, general-purpose structured text indexing and retrieval engine. It reads records in a - variety of input formats (eg. email, &acro.xml;, &acro.marc;) and provides access + variety of input formats (e.g. email, &acro.xml;, &acro.marc;) and provides access to them through a powerful combination of boolean search expressions and relevance-ranked free-text queries. @@ -207,7 +207,7 @@ Predefined field types user defined Data fields can be indexed as phrase, as into word - tokenized text, as numeric values, url's, dates, and raw binary + tokenized text, as numeric values, URLs, dates, and raw binary data. and @@ -217,7 +217,7 @@ Regular expression matching available Full regular expression matching and "approximate - matching" (eg. spelling mistake corrections) are handled. + matching" (e.g. spelling mistake corrections) are handled. @@ -780,7 +780,7 @@ Why does Kete wants to use Zebra?? Speed, Scalability and easy integration with Koha. Read their detailled + url="http://kete.net.nz/blog/topics/show/44-who-what-why-when-answering-some-of-the-niggly-development-questions">detailed reasoning here. @@ -960,7 +960,7 @@ &zebra; has been used by a variety of institutions to construct indexes of large web sites, typically in the region of tens of millions of pages. In this role, it functions somewhat similarly - to the engine of google or altavista, but for a selected intranet + to the engine of Google or AltaVista, but for a selected intranet or a subset of the whole Web. diff --git a/doc/marc_indexing.xml b/doc/marc_indexing.xml index 453f9d6..597e7a6 100644 --- a/doc/marc_indexing.xml +++ b/doc/marc_indexing.xml @@ -60,7 +60,7 @@ records. for &acro.marc; records. This term helps to understand the notation of extended indexing of MARC records by &zebra;. Our definition is based on the document "The table of conformity for &acro.z3950; use attributes and R&acro.usmarc; fields". -The document is available only in russian language. +The document is available only in Russian language. The index-formula is the combination of subfields presented in such way: diff --git a/doc/querymodel.xml b/doc/querymodel.xml index e4fc7db..0dbcebe 100644 --- a/doc/querymodel.xml +++ b/doc/querymodel.xml @@ -149,7 +149,7 @@ The &acro.pqf; grammar is documented in the &yaz; manual, and shall not be repeated here. This textual &acro.pqf; representation - is not transmistted to &zebra; during search, but it is in the + is not transmitted to &zebra; during search, but it is in the client mapped to the equivalent &acro.z3950; binary query parse tree. @@ -500,7 +500,7 @@ It is possible to search in any silly string index - if it's defined in your - indexation rules and can be parsed by the &acro.pqf; parser. + indexing rules and can be parsed by the &acro.pqf; parser. This is definitely not the recommended use of this facility, as it might confuse your users with some very unexpected results. @@ -527,7 +527,7 @@ string attributes which in appearance resemble XPath queries. There are two problems with this approach: first, the XPath-look-alike has to - be defined at indexation time, no new undefined + be defined at indexing time, no new undefined XPath queries can entered at search time, and second, it might confuse users very much that an XPath-alike index name in fact gets populated from a possible entirely different &acro.xml; element @@ -1157,7 +1157,7 @@ Word list (6) is supported, and maps to the boolean AND combination of words supplied. The word list is useful when - google-like bag-of-word queries need to be translated from a GUI + Google-like bag-of-word queries need to be translated from a GUI query language to &acro.pqf;. For example, the following queries are equivalent: @@ -1406,7 +1406,7 @@ search and scan in index type="p". - The Complete subfield (2) is a reminiscens + The Complete subfield (2) is a reminiscent from the happy &acro.marc; binary format days. &zebra; does not support it, but maps silently to Complete field (3). @@ -1681,14 +1681,14 @@ By setting an estimation limit size of the resultset of the &acro.apt; - leaves, &zebra; stoppes processing the result set when the limit + leaves, &zebra; stops processing the result set when the limit length is reached. Hit counts under this limit are still precise, but hit counts over it are estimated using the statistics gathered from the chopped result set. - Specifying a limit of 0 resuts in exact hit counts. + Specifying a limit of 0 results in exact hit counts. For example, we might be interested in exact hit count for a, but @@ -2188,19 +2188,19 @@ key (@attr 4=3) ignored Null bitmap ('0') - Used for non-tokenizated and non-normalized bit sequences + Used for non-tokenized and non-normalized bit sequences year (@attr 4=4) ignored Year ('y') - Non-tokenizated and non-normalized 4 digit numbers + Non-tokenized and non-normalized 4 digit numbers date (@attr 4=5) ignored Date ('d') - Non-tokenizated and non-normalized ISO date strings + Non-tokenized and non-normalized ISO date strings ignored diff --git a/doc/recordmodel-domxml.xml b/doc/recordmodel-domxml.xml index 50876cb..391b453 100644 --- a/doc/recordmodel-domxml.xml +++ b/doc/recordmodel-domxml.xml @@ -461,7 +461,7 @@ found. Therefore, invalid document processing is aborted, and any content of the <extract> and - <store> pipelines is discarted. + <store> pipelines is discarded. A warning is issued in the logs. @@ -819,7 +819,7 @@ Debuggig &acro.dom; Filter Configurations It can be very hard to debug a &acro.dom; filter setup due to the many - sucessive &acro.marc; syntax translations, &acro.xml; stream splitting and + successive &acro.marc; syntax translations, &acro.xml; stream splitting and &acro.xslt; transformations involved. As an aid, you have always the power of the -s command line switch to the zebraidz indexing command at your hand: diff --git a/doc/recordmodel-grs.xml b/doc/recordmodel-grs.xml index cbda1dc..c4ff6c7 100644 --- a/doc/recordmodel-grs.xml +++ b/doc/recordmodel-grs.xml @@ -47,7 +47,7 @@ which describes the specific &acro.marc; structure of the input record as well as the indexing rules. - The grs.marc uses an internal represtantion + The grs.marc uses an internal representation which is not &acro.xml; conformant. In particular &acro.marc; tags are presented as elements with the same name. And &acro.xml; elements may not start with digits. Therefore this filter is only @@ -99,7 +99,7 @@ The loadable grs.xml filter module - is packagged in the GNU/Debian package + is packaged in the GNU/Debian package libidzebra2.0-mod-grs-xml @@ -473,7 +473,7 @@ Begin a new record. The following parameter should be the - name of the schema that describes the structure of the record, eg. + name of the schema that describes the structure of the record, e.g., gils or wais (see below). The begin record call should precede any other use of the begin statement. @@ -1590,7 +1590,7 @@ provides a default variant request for use when the individual element requests (see below) do not contain a variant request. Variant requests consist of a blank-separated list of - variant components. A variant compont is a comma-separated, + variant components. A variant component is a comma-separated, parenthesized triple of variant class, type, and value (the two former values being represented as integers). The value can currently only be entered as a string (this will change to depend on the definition of @@ -1809,7 +1809,7 @@ ISO2709-based formats (&acro.usmarc;, etc.). Only records with a two-level structure (corresponding to fields and subfields) can be directly mapped to ISO2709. For records with a different structuring - (eg., GILS), the representation in a structure like &acro.usmarc; involves a + (e.g., GILS), the representation in a structure like &acro.usmarc; involves a schema-mapping (see ), to an "implied" &acro.usmarc; schema (implied, because there is no formal schema which specifies the use of the @@ -1894,7 +1894,7 @@ Our definition is based on the document "The table of conformity for &acro.z3950; use attributes and R&acro.usmarc; fields". - The document is available only in russian language. + The document is available only in Russian language. The index-formula is the combination of diff --git a/doc/tutorial.xml b/doc/tutorial.xml index bc58142..e96f942 100644 --- a/doc/tutorial.xml +++ b/doc/tutorial.xml @@ -22,8 +22,8 @@ Additional OAI test records can be downloaded by running a shell - script (you may want to abort the script when you have waitet - longer than your coffe brews ..). + script (you may want to abort the script when you have waited + longer than your coffee brews ..). cd data ./fetch_OAI_data.sh @@ -105,7 +105,7 @@ Searching and retrieving &acro.xml; records is easy. For example, - you can point your browser to one of the following url's to + you can point your browser to one of the following URLs to search for the term the. Just point your browser at this link: - These URL's woun't work unless you have indexed the example data + These URLs won't work unless you have indexed the example data and started an &zebra; server as outlined in the previous section. In case we actually want to retrieve one record, we need to alter - our URl to the following + our URL to the following http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=dc @@ -159,7 +159,7 @@ conf/oai2dc.xsl, and the zebra schema implemented in conf/oai2zebra.xsl. - The URL's for acessing both are the same, except for the different + The URLs for accessing both are the same, except for the different value of the recordSchema parameter: http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=dc @@ -208,7 +208,7 @@ The &acro.oai; indexing example defines many different index names, a study of the conf/oai2index.xsl stylesheet reveals the following word type indexes (i.e. those - swith suffix :w): + with suffix :w): any:w title:w @@ -257,7 +257,7 @@ Investigating the content of the indexes - How doess the magic work? What is inside the indexes? Why is a certain + How does the magic work? What is inside the indexes? Why is a certain record found by a search, and another not?. The answer is in the inverted indexes. You can easily investigate them using the special &zebra; schema @@ -310,13 +310,13 @@ The &acro.sru; specification mandates that the &acro.cql; query language is supported and properly configure. Also, the server - needs to be able to emmit a proper &acro.explain; &acro.xml; + needs to be able to emit a proper &acro.explain; &acro.xml; record, which is used to determine the capabilities of the specific server instance. - In this example configuration we expoit the similarities between + In this example configuration we exploit the similarities between the &acro.explain; record and the &acro.cql; query language configuration, we generate the later from the former using an &acro.xslt; transformation. @@ -374,11 +374,11 @@ url="http://localhost:9999/?version=1.1&operation=scan&scanClause=dc.identifier=fish"> http://localhost:9999/?version=1.1&operation=scan&scanClause=dc.identifier=fish - accesses the indexed indentifiers. + accesses the indexed identifiers. - In addition, all &zebra; internal special elemen sets or record + In addition, all &zebra; internal special element sets or record schema's of the form zebra:: just work right out of the box The default behavior for zebrasrv - if started - as non-priviledged user - is to establish + as non-privileged user - is to establish a single TCP/IP listener, for the &acro.z3950; protocol, on port 9999. zebrasrv @ diff --git a/doc/zebrasrv-virtual.xml b/doc/zebrasrv-virtual.xml index ef88a7c..63d4d27 100644 --- a/doc/zebrasrv-virtual.xml +++ b/doc/zebrasrv-virtual.xml @@ -6,7 +6,7 @@ The Virtual hosts mechanism allows a &yaz; frontend server to support multiple backends. A backend is selected on the basis of - the TCP/IP binding (port+listening adddress) and/or the virtual host. + the TCP/IP binding (port+listening address) and/or the virtual host. A backend can be configured to execute in a particular working @@ -86,7 +86,7 @@ Specifies listener for this server. If this attribute is not given, the server is accessible from all listener. In order - for the server to be used for real, howeever, the virtual host + for the server to be used for real, however, the virtual host must match (if specified in the configuration). @@ -106,7 +106,7 @@ Specifies a working directory for this backend server. If - specifid, the &yaz; fronend changes current working directory + specified, the &yaz; frontend changes current working directory to this directory whenever a backend of this type is started (backend handler bend_start), stopped (backend handler hand_stop) and initialized (bend_init). diff --git a/doc/zebrasrv.xml b/doc/zebrasrv.xml index 86cac29..b1d342d 100644 --- a/doc/zebrasrv.xml +++ b/doc/zebrasrv.xml @@ -30,7 +30,7 @@ DESCRIPTION Zebra is a high-performance, general-purpose structured text indexing and retrieval engine. It reads structured records in a variety of input - formats (eg. email, &acro.xml;, &acro.marc;) and allows access to them through exact + formats (e.g. email, &acro.xml;, &acro.marc;) and allows access to them through exact boolean search expressions and relevance-ranked free-text queries. @@ -238,7 +238,7 @@ This will display the &acro.xml;-formatted &acro.sru; response that includes the first record in the result-set found by the query mineral. (For clarity, the &acro.sru; URL is shown - here broken across lines, but the lines should be joined to gether + here broken across lines, but the lines should be joined together to make single-line URL for the browser to submit.) @@ -372,7 +372,7 @@ new alpha stuff, and a lot of work has yet to be done .. - There is no linkeage whatsoever between the &acro.z3950; explain model + There is no linkage whatsoever between the &acro.z3950; explain model and the &acro.sru; explain response (well, at least not implemented in Zebra, that is ..). Zebra does not provide a means using &acro.z3950; to obtain the ZeeRex record.