summary |
shortlog |
log |
commit | commitdiff |
tree
raw |
patch |
inline | side by side (from parent 1:
ddd6e6b)
often, to reduce conflicts if Adam is also working on this. Actually,
if he is, and if my checking happens first, then there will still be
conflicts but I will have shifted them onto Adam, as whoever checks in
second has to resolve the problems. So I should have said that this
checkin is in order to shift/reduce conflicts :-)
<chapter id="administration">
<chapter id="administration">
- <!-- $Id: administration.xml,v 1.31 2006-05-01 13:07:40 marc Exp $ -->
+ <!-- $Id: administration.xml,v 1.32 2006-05-02 12:23:02 mike Exp $ -->
<title>Administrating Zebra</title>
<!-- ### It's a bit daft that this chapter (which describes half of
the configuration-file formats) is separated from
<title>Administrating Zebra</title>
<!-- ### It's a bit daft that this chapter (which describes half of
the configuration-file formats) is separated from
<sect1 id="administration-ranking">
<title>Relevance Ranking and Sorting of Result Sets</title>
<sect1 id="administration-ranking">
<title>Relevance Ranking and Sorting of Result Sets</title>
+ <sect2>
+ <title>Overview</title>
<para>
The default ordering of a result set is left up to the server,
which inside Zebra means sorting in ascending document ID order.
<para>
The default ordering of a result set is left up to the server,
which inside Zebra means sorting in ascending document ID order.
- In case a good presentation ordering can be computed at
+ In cases where a good presentation ordering can be computed at
indexing time, we can use a fixed <literal>static ranking</literal>
scheme, which is provided for the <literal>alvis</literal>
indexing filter. This defines a fixed ordering of hit lists,
indexing time, we can use a fixed <literal>static ranking</literal>
scheme, which is provided for the <literal>alvis</literal>
indexing filter. This defines a fixed ordering of hit lists,
There are cases, however, where relevance of hit set documents is
highly dependent on the query processed.
Simply put, <literal>dynamic relevance ranking</literal>
There are cases, however, where relevance of hit set documents is
highly dependent on the query processed.
Simply put, <literal>dynamic relevance ranking</literal>
- sortes a set of retrieved
+ sorts a set of retrieved
records such
that those most likely to be relevant to your request are
retrieved first.
records such
that those most likely to be relevant to your request are
retrieved first.
- Internally, Zebra retrieves all documents ID's that satisfy your
- search query, and re-orders the hit list to arrange them based on
+ Internally, Zebra retrieves all documents that satisfy your
+ query, and re-orders the hit list to arrange them based on
a measurement of similarity between your query and the content of
each record.
</para>
a measurement of similarity between your query and the content of
each record.
</para>
lexicographical ordering of certain sort indexes created at
indexing time.
</para>
lexicographical ordering of certain sort indexes created at
indexing time.
</para>
<sect2 id="administration-ranking-static">
<sect2 id="administration-ranking-static">
are ordered
first by ascending static rank,
then by ascending document <literal>ID</literal>.
are ordered
first by ascending static rank,
then by ascending document <literal>ID</literal>.
- </para>
- <para>
- This implies that the default rank <literal>0</literal>
- is the best rank at the
- beginning of the list, and <literal>max int</literal>
- is the worst static rank.
+ Zero
+ is the ``best'' rank, as it occurs at the
+ beginning of the list; higher numbers represent worse scores.
</para>
<para>
The experimental <literal>alvis</literal> filter provides a
</para>
<para>
The experimental <literal>alvis</literal> filter provides a
after <emphasis>ascending</emphasis> static
rank, and for those doc's which have the same static rank, ordered
after <emphasis>ascending</emphasis> doc <literal>ID</literal>.
after <emphasis>ascending</emphasis> static
rank, and for those doc's which have the same static rank, ordered
after <emphasis>ascending</emphasis> doc <literal>ID</literal>.
- See <xref linkend="record-model-alvisxslt"/> for the glory details.
+ See <xref linkend="record-model-alvisxslt"/> for the gory details.
<sect2 id="administration-ranking-dynamic">
<title>Dynamic Ranking</title>
<para>
<sect2 id="administration-ranking-dynamic">
<title>Dynamic Ranking</title>
<para>
- If one wants to do a little fiddeling with the static rank order,
- one has to invoke additional re-ranking/re-ordering using dynamic
- reranking or score functions. These functions return positive
- interger scores, where <emphasis>highest</emphasis> score is
- <emphasis>best</emphasis>, which means that the
- hit sets will be sorted according to
+ In order to fiddle with the static rank order, it is necessary to
+ invoke additional re-ranking/re-ordering using dynamic
+ ranking or score functions. These functions return positive
+ integer scores, where <emphasis>highest</emphasis> score is
+ ``best'';
+ hit sets are sorted according to
<emphasis>decending</emphasis>
scores (in contrary
to the index lists which are sorted according to
<emphasis>decending</emphasis>
scores (in contrary
to the index lists which are sorted according to
- <emphasis>ascending</emphasis> rank number and document ID).
+ ascending rank number and document ID).
- Those are in the zebra config file enabled by a directive like (use
- only one of these a time!):
+ Dynamic ranking is enabled by a directive like one of the
+ following in the zebra config file (use only one of these a time!):
<screen>
rank: rank-1 # default TDF-IDF like
rank: rank-static # dummy do-nothing
<screen>
rank: rank-1 # default TDF-IDF like
rank: rank-static # dummy do-nothing
Notice that the <literal>rank-1</literal> and
<literal>zvrank</literal> do not use the static rank
information in the list keys, and will produce the same ordering
Notice that the <literal>rank-1</literal> and
<literal>zvrank</literal> do not use the static rank
information in the list keys, and will produce the same ordering
- with our without static ranking enabled.
+ with or without static ranking enabled.
</para>
<para>
The dummy <literal>rank-static</literal> reranking/scoring
function returns just
<literal>score = max int - staticrank</literal>
</para>
<para>
The dummy <literal>rank-static</literal> reranking/scoring
function returns just
<literal>score = max int - staticrank</literal>
- in order to preserve the ordering of hit sets with and without it's
- call.
- Obviously, to combine static and dynamic ranking usefully, one wants
+ in order to preserve the static ordering of hit sets that would
+ have been produced had it not been invoked.
+ Obviously, to combine static and dynamic ranking usefully,
+ it is necessary
- function, which is left
as an exercise for the reader.
</para>
<para>
as an exercise for the reader.
</para>
<para>
- Invoking dynamic ranking is done in query time (this is why we
- call it 'dynamic ranking' in the first place ..). One has to add
+ Dynamic ranking is done at query time rather than
+ indexing time (this is why we
+ call it ``dynamic ranking'' in the first place ...)
+ It is invoked by adding
the Bib-1 relation attribute with
the Bib-1 relation attribute with
- value "relevance" to the PQF query (that is, <literal>@attr
- 2=102</literal>, see also
+ value ``relevance'' to the PQF query (that is,
+ <literal>@attr 2=102</literal>, see also
<ulink url="ftp://ftp.loc.gov/pub/z3950/defs/bib1.txt">
The BIB-1 Attribute Set Semantics</ulink>).
<ulink url="ftp://ftp.loc.gov/pub/z3950/defs/bib1.txt">
The BIB-1 Attribute Set Semantics</ulink>).
- To find all articles with the word 'Eoraptor' in
- the title, and present them relevance ranked, one issues the PQF query:
+ To find all articles with the word <literal>Eoraptor</literal> in
+ the title, and present them relevance ranked, issue the PQF query:
- Z> f @attr 2=102 @attr 1=4 Eoraptor
+ @attr 2=102 @attr 1=4 Eoraptor
with <literal>estimated hit sizes</literal>, as all documents in
a hit set must be acessed to compute the correct placing in a
ranking sorted list. Therefore the use attribute setting
with <literal>estimated hit sizes</literal>, as all documents in
a hit set must be acessed to compute the correct placing in a
ranking sorted list. Therefore the use attribute setting
- <literal>@attr 2=102</literal> clashes with
- <literal>@attr 9=</literal>.
+ <literal>@attr 2=102</literal> clashes with
+ <literal>@attr 9=integer</literal>.