From 4386e2d60238c3698153d90bedc3fb0f35a7fe3f Mon Sep 17 00:00:00 2001 From: Mike Taylor Date: Mon, 25 Nov 2002 12:57:54 +0000 Subject: [PATCH] Notes for documentation. --- doc/harvest.mbox | 120 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ doc/isam-hint | 54 ++++++++++++++++++++++++ 2 files changed, 174 insertions(+) create mode 100644 doc/isam-hint diff --git a/doc/harvest.mbox b/doc/harvest.mbox index 4a24a6f..2e3385c 100644 --- a/doc/harvest.mbox +++ b/doc/harvest.mbox @@ -158,3 +158,123 @@ I am very happy to see such a nice software available under GPL. Thanks. kj +From zebralist-admin@indexdata.dk Mon Nov 25 11:13:10 2002 +MIME-Version: 1.0 +Envelope-to: zebra@miketaylor.org.uk +From: Pete +X-X-Sender: qq15@uxa.liv.ac.uk +To: Kang-Jin Lee +cc: zebralist@indexdata.dk +Subject: Re: [Zebralist] Some progress on Harvest's move to Zebra +In-Reply-To: <200211242045.19196.lee@arco.de> +Content-Type: TEXT/PLAIN; charset=US-ASCII +X-Spam-Level: +Sender: zebralist-admin@indexdata.dk +X-BeenThere: zebralist@indexdata.dk +X-Mailman-Version: 2.0.11 +Precedence: bulk +List-Help: +List-Post: +List-Subscribe: , + +List-Id: Zebra Information Server +List-Unsubscribe: , + +List-Archive: +Date: Mon, 25 Nov 2002 10:19:37 +0000 (GMT) +X-Spam-Status: No, hits=-4.4 required=5.0 tests=IN_REP_TO version=2.20 +X-Spam-Level: +Content-Length: 2853 + +On Sun, 24 Nov 2002, Kang-Jin Lee wrote: + +>Hi, +> +>I finished first steps to use Zebra as fulltext engine for Harvest +>(http://harvest.sourceforge.net/). The performance boost after +>some testing are quite impressive. + +Hi ... I'd almost forgotten that the Harvest project is still active. + +We had a heap of challenges with our Harvest setup and with the +time taken to index and search ... we switched to using +Harvest-NG as the "reaper/gatherer" and modified Zebra to +work with SOIF and our own ranking algorithm - it's been in +service for over 6 months now. + +We had challenges with both speed of gathering and with +speed of indexing and searching but most seem to be +"managable" now. + +We offered our modifications to Zebra to Indexdata who +offered to look at them since the latest release of Zebra +is sufficiently different at the code level to make it +non-trivial for us to apply our code modifications to +it. + + +Cheers + +Pete Mallinson + +> +>Here is my article I wrote for the Harvest mailinglist. +> +>Many thanks for Zebra. +> +>------------------------------------------------------ +>Hi, +> +>The first results after some testing with Zebra are very promising. +> +>The tests were done with around 220 000 SOIF files, which occupies +>1.6GB of disk space. +> +>Building the index from scratch takes around one hour with Zebra where +>Glimpse needs around five hours. +> +>While glimpse blocks search requests when updating its index, Zebra +>can still answer search requests. +> +>While the search time of glimpse varies from some seconds to some +>minutes depending how expensive the query is, Zebra usually takes +>around one to three seconds, even for expensive queries. +> +>Glimpse' index occupies around 250MB of disk space, Zebra's index +>takes around 570MB. +> +>Zebra supports incremental indexing which will speed up indexing even +>further. +> +>There are still potential for faster searches when necessary, using +>tweaks on apache. +> +>On the other hand, modeling data is not complete, yet. +> +>To sum it up: +>- Zebra indexes data five times faster than Glimpse +>- Zebra doesn't cause downtimes for indexupdate +>- Zebra's search time doesn't jump from seconds to minutes for no +> obvious reason, but stays constant within a range of one to three +> seconds +>- Zebra can search more than 100 times faster than Glimpse +>- Zebra can process multiple search requests simultaneously +>- Zebra can speed up indexing by using incremental indexing +>- Glimpse's index size is only around half of the Zebra's index +> +>kj +>------------------------------------------------------ +> +>_______________________________________________ +>Zebralist mailing list +>Zebralist@indexdata.dk +>http://www.indexdata.dk/mailman/listinfo/zebralist +> + + + +_______________________________________________ +Zebralist mailing list +Zebralist@indexdata.dk +http://www.indexdata.dk/mailman/listinfo/zebralist + diff --git a/doc/isam-hint b/doc/isam-hint new file mode 100644 index 0000000..9c478db --- /dev/null +++ b/doc/isam-hint @@ -0,0 +1,54 @@ +From zebralist-admin@indexdata.dk Mon Nov 25 12:13:35 2002 +Envelope-to: zebra@miketaylor.org.uk +To: zebralist@indexdata.dk +Subject: Re: [Zebralist] Some progress on Harvest's move to Zebra +References: <200211242045.19196.lee@arco.de> <4.2.0.58.20021125113016.029edf10@b +agel.indexdata.dk> +Mime-Version: 1.0 +Content-Type: text/plain; charset=us-ascii +Content-Disposition: inline +In-Reply-To: <4.2.0.58.20021125113016.029edf10@bagel.indexdata.dk> +User-Agent: Mutt/1.3.28i +From: Heikki Levanto +Sender: zebralist-admin@indexdata.dk +Errors-To: zebralist-admin@indexdata.dk +X-BeenThere: zebralist@indexdata.dk +X-Mailman-Version: 2.0.11 +Precedence: bulk +List-Help: +List-Post: +List-Subscribe: , + +List-Id: Zebra Information Server +List-Unsubscribe: , + +List-Archive: +Date: Mon, 25 Nov 2002 12:45:14 +0100 +X-Spam-Status: No, hits=-4.4 required=5.0 tests=IN_REP_TO,AWL version=2.20 +X-Spam-Level: +Content-Length: 949 + +On Mon, Nov 25, 2002 at 11:31:55AM +0100, Sebastian Hammer wrote: +> We'd be very keen to have feedback on the incremental indexing performance +> of the current version of Zebra -- there are some significant improvements +> on single record updates but we don't have a lot of statistics yet on +> larger update batches. + +I would like to add a piece of advice: Most of the improvements on +incremental updates are in the new isam system. To see them in effect, +you will have to add the line + isam: b +in your zebra.cfg. After this you will need to reindex everything. The +effect should be especially good on smallish incremental updates to +large bases. + +The isam-b was introduced in 1.3.0, but we recommend using 1.3.2 (or +later). + + + + + +-- +Heikki Levanto heikki@indexdata.dk "In Murphy We Turst" + -- 1.7.10.4