From 4e7d0c112f988c61da5e9c2db928b73735ef1f4a Mon Sep 17 00:00:00 2001 From: Mike Taylor Date: Mon, 25 Nov 2002 00:31:14 +0000 Subject: [PATCH] new file --- doc/harvest.mbox | 118 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 118 insertions(+) create mode 100644 doc/harvest.mbox diff --git a/doc/harvest.mbox b/doc/harvest.mbox new file mode 100644 index 0000000..0b3bb82 --- /dev/null +++ b/doc/harvest.mbox @@ -0,0 +1,118 @@ +From zebralist-admin@indexdata.dk Sun Nov 24 23:16:24 2002 +MIME-Version: 1.0 +Envelope-to: zebra@miketaylor.org.uk +Content-Type: text/plain; + charset="us-ascii" +From: Kang-Jin Lee +To: zebralist@indexdata.dk +User-Agent: KMail/1.4.3 +X-Spam-Level: +Subject: [Zebralist] Some progress on Harvest's move to Zebra +Sender: zebralist-admin@indexdata.dk +X-BeenThere: zebralist@indexdata.dk +X-Mailman-Version: 2.0.11 +Precedence: bulk +List-Help: +List-Post: +List-Subscribe: , + +List-Id: Zebra Information Server +List-Unsubscribe: , + +List-Archive: +Date: Sun, 24 Nov 2002 20:45:19 +0100 +X-Spam-Status: No, hits=-1.0 required=5.0 tests=AWL version=2.20 +X-Spam-Level: +X-MIME-Autoconverted: from quoted-printable to 8bit by localhost.localdomain id gAONGNK15639 + +Hi, + +I finished first steps to use Zebra as fulltext engine for Harvest +(http://harvest.sourceforge.net/). The performance boost after +some testing are quite impressive. + +Here is my article I wrote for the Harvest mailinglist. + +Many thanks for Zebra. + +------------------------------------------------------ +Hi, + +The first results after some testing with Zebra are very promising. + +The tests were done with around 220 000 SOIF files, which occupies +1.6GB of disk space. + +Building the index from scratch takes around one hour with Zebra where +Glimpse needs around five hours. + +While glimpse blocks search requests when updating its index, Zebra +can still answer search requests. + +While the search time of glimpse varies from some seconds to some +minutes depending how expensive the query is, Zebra usually takes +around one to three seconds, even for expensive queries. + +Glimpse' index occupies around 250MB of disk space, Zebra's index +takes around 570MB. + +Zebra supports incremental indexing which will speed up indexing even +further. + +There are still potential for faster searches when necessary, using +tweaks on apache. + +On the other hand, modeling data is not complete, yet. + +To sum it up: +- Zebra indexes data five times faster than Glimpse +- Zebra doesn't cause downtimes for indexupdate +- Zebra's search time doesn't jump from seconds to minutes for no + obvious reason, but stays constant within a range of one to three + seconds +- Zebra can search more than 100 times faster than Glimpse +- Zebra can process multiple search requests simultaneously +- Zebra can speed up indexing by using incremental indexing +- Glimpse's index size is only around half of the Zebra's index + +kj +------------------------------------------------------ + +_______________________________________________ +Zebralist mailing list +Zebralist@indexdata.dk +http://www.indexdata.dk/mailman/listinfo/zebralist + +From mike@miketaylor.org.uk Sun Nov 24 23:41:14 2002 +Date: Sun, 24 Nov 2002 23:41:13 GMT +From: Mike Taylor +X-Was-To: lee@arco.de +X-Was-CC: zebralist@indexdata.dk +Cc: mike@localhost.localdomain +In-reply-to: <200211242045.19196.lee@arco.de> (message from Kang-Jin Lee on + Sun, 24 Nov 2002 20:45:19 +0100) +Subject: Re: [Zebralist] Some progress on Harvest's move to Zebra + +> Date: Sun, 24 Nov 2002 20:45:19 +0100 +> From: Kang-Jin Lee +> +> Here is my article I wrote for the Harvest mailinglist. + +Hi K-J, + +It's nice to read all this good stuff about Zebra! I'm currently +working on changes to the documentation for the next Zebra release, +and I'd love to include a lightly-edited version of your message in +the new document. (Basically, I'd obscure the name of your old +engine, so it's clear that we're trying to say good things about Zebra +rather than score points off a competitor.) Would it be OK for me to +quote you? If yes in principle, then I'll run the actual wording past +you before submitting it. + +Thanks, + + _/|_ _______________________________________________________________ +/o ) \/ Mike Taylor www.miketaylor.org.uk +)_v__/\ "You question the worthiness of my code? I should kill you + where you stand!" -- Klingon Programming Mantra + -- 1.7.10.4