Adam Dickmeiss [Thu, 8 Mar 2007 13:18:35 +0000 (13:18 +0000)]
For MARC indexing, skip until record separator is met.
Adam Dickmeiss [Thu, 8 Mar 2007 12:57:35 +0000 (12:57 +0000)]
Bump to 2.0.13
Marc Cromme [Thu, 8 Mar 2007 11:29:16 +0000 (11:29 +0000)]
corrected typo
Marc Cromme [Thu, 8 Mar 2007 11:24:50 +0000 (11:24 +0000)]
added example of MARCXML indexing with chopping of sort indexes cccording to 'ind2' field containing integer
Adam Dickmeiss [Wed, 7 Mar 2007 21:25:29 +0000 (21:25 +0000)]
Added mod_dom to win32 makefile
Adam Dickmeiss [Wed, 7 Mar 2007 21:14:15 +0000 (21:14 +0000)]
Towards 2.0.12
Adam Dickmeiss [Wed, 7 Mar 2007 21:08:36 +0000 (21:08 +0000)]
Fixed bug with indexing of attributes for rec.grs-class of filters. If
xpath was enabled xelm a/@b would be ignored.
Marc Cromme [Wed, 7 Mar 2007 14:18:35 +0000 (14:18 +0000)]
Added always the XML parsing flag XML_PARSE_NONET to any XML_PARSE_XINCLUDE to avoid spoofing Zebra to fetch megabyte from an external xincluded url. pretty normal safety thing to do, we just did forget before.
Marc Cromme [Wed, 7 Mar 2007 13:05:20 +0000 (13:05 +0000)]
removed documentation of non-working 'insert', 'update' 'delete' functionality in Alvis filter
removed 'update' instruction from example OAI indexing stylesheet
Adam Dickmeiss [Tue, 6 Mar 2007 12:40:18 +0000 (12:40 +0000)]
Fixed bug #931: lem 'zebra::index::field' hangs if not specified 'storeKeys: 1' in zebra.cfg.
Adam Dickmeiss [Tue, 6 Mar 2007 12:21:04 +0000 (12:21 +0000)]
Fixed bug #943: Searches with localid always find a hit.
Adam Dickmeiss [Tue, 6 Mar 2007 12:09:44 +0000 (12:09 +0000)]
Avoid mixed stmt/var declare
Marc Cromme [Tue, 6 Mar 2007 09:24:34 +0000 (09:24 +0000)]
added missing extra dist target
Adam Dickmeiss [Tue, 6 Mar 2007 08:48:57 +0000 (08:48 +0000)]
Fixed bug #946: Coredump on MARC display.
Adam Dickmeiss [Tue, 6 Mar 2007 08:23:24 +0000 (08:23 +0000)]
Added missing xsl for dom1 test.
Marc Cromme [Mon, 5 Mar 2007 13:02:11 +0000 (13:02 +0000)]
added tests for bug #883 'Need an 'ignore' value for the z:type
attribute in the canonical indexing format'
resolved bug #883
tested as well on gutenberg collection
zebra-setup/gutenberg
case closed, see
http://bugzilla.indexdata.dk/show_bug.cgi?id=883
Adam Dickmeiss [Sat, 3 Mar 2007 21:39:10 +0000 (21:39 +0000)]
Fixes for perform_convert: use xmlParseMemory instead of xmlParseMemory
to avoid reading beyond end of buffer. Ensure conversions are stopped
if XSLT conversion fail(s).
Marc Cromme [Thu, 1 Mar 2007 11:21:20 +0000 (11:21 +0000)]
removed section on special record retrieval features, which need a rewrite - only commented out.
added section on debugging of DOM filter configurations
added a bullet point on semantics of DOM filter explaining that records not emerging record and index instructions are discarted, i.e. dropped on the floor. This meets Seb's wishes for the gutenberg collection
Marc Cromme [Thu, 1 Mar 2007 11:18:40 +0000 (11:18 +0000)]
removed quick start and examples, which are very GRS-1 centric.
These need re-writing in terms of the DOM filter
Adam Dickmeiss [Thu, 1 Mar 2007 10:35:46 +0000 (10:35 +0000)]
Allow record filters to return 'skip' this record (RECCTRL_EXTRACT_SKIP).
Make dom filter return 'skip' if no zebra 'record' node exists in
indexing document. Bug #883.
Adam Dickmeiss [Wed, 28 Feb 2007 18:43:06 +0000 (18:43 +0000)]
Fix handling of record retrieval in the case of open failure of external
record file (storedata:0).
Marc Cromme [Wed, 28 Feb 2007 16:46:19 +0000 (16:46 +0000)]
added nice debug output of all xmlreader and xslt XML stuff when running with
zebra/index/zebraidx -c zebra.cfg -s update water.rdf
Don't do thins on huge data - the logs will be at least 4-6 times the size of the input data !!
Marc Cromme [Wed, 28 Feb 2007 14:46:41 +0000 (14:46 +0000)]
closing bug #928 by dropping DOM document to xmlbuffer and re-reading into DOM each time a XSLT transform did occur. Yes, ugly, ugly, but no other possibility.
Added output of XML after each transformation on YLOG_DEBUG level, run indexer with '-v debug' to see all transformations
Marc Cromme [Wed, 28 Feb 2007 13:16:24 +0000 (13:16 +0000)]
removed general warning log of indexing process. this can be seen by running the indexer with '-v debug' anyhow.
Adam Dickmeiss [Mon, 26 Feb 2007 16:12:24 +0000 (16:12 +0000)]
Avoid sprintf with NULL %s value (Solaris dislikes it)
Adam Dickmeiss [Sat, 24 Feb 2007 17:05:40 +0000 (17:05 +0000)]
Fixed bug #929: Unfinished transaction in non-shadow does not get a
warn.
Adam Dickmeiss [Sat, 24 Feb 2007 16:47:16 +0000 (16:47 +0000)]
Deal with two common places for corrupt Explain database
Adam Dickmeiss [Sat, 24 Feb 2007 16:46:22 +0000 (16:46 +0000)]
Proper cleanup (isamb_close) for bad headers
Adam Dickmeiss [Fri, 23 Feb 2007 14:59:12 +0000 (14:59 +0000)]
Use xmlGetLineNo instead of xmlGetNodePath for errors/warnings
Adam Dickmeiss [Fri, 23 Feb 2007 11:35:08 +0000 (11:35 +0000)]
For each element macro.
Adam Dickmeiss [Fri, 23 Feb 2007 11:16:39 +0000 (11:16 +0000)]
For dom filter, in input element construct, parse @inputcharset instead
of @charset .
Adam Dickmeiss [Fri, 23 Feb 2007 11:10:37 +0000 (11:10 +0000)]
Wrap log messages for dom filter. This uses yaz_vsnprintf. Requires
YAZ 2.1.49 or later.
Adam Dickmeiss [Fri, 23 Feb 2007 09:35:17 +0000 (09:35 +0000)]
Fix dist: do not put domfilter.eps in dist.
Marc Cromme [Thu, 22 Feb 2007 15:44:19 +0000 (15:44 +0000)]
added more instructions to DOM filter docs, spell checked both DOM and Alvis filter docs
Marc Cromme [Thu, 22 Feb 2007 12:22:04 +0000 (12:22 +0000)]
added missing dependendy of index.html to all PNG files
Marc Cromme [Thu, 22 Feb 2007 12:10:09 +0000 (12:10 +0000)]
added missing domfilter.eps to make rules, such that it is included in the distribution tarball
Adam Dickmeiss [Thu, 22 Feb 2007 08:59:30 +0000 (08:59 +0000)]
Remove PDF files from EXTRA_DIST/doc_DATA (as done for yaz, metaproxy
for quite some time). Avoid rule option '--export-area-drawing' for
inkscape for generating .png (it doesnt work with sarge). Bug #916.
Adam Dickmeiss [Wed, 21 Feb 2007 17:03:23 +0000 (17:03 +0000)]
zebra.pdf depends on domfilter.pdf
Marc Cromme [Wed, 21 Feb 2007 15:03:30 +0000 (15:03 +0000)]
more info on DOM filter config
Marc Cromme [Wed, 21 Feb 2007 14:15:45 +0000 (14:15 +0000)]
added domfilter.svg to distribution tarball, now make dist runs again
Marc Cromme [Wed, 21 Feb 2007 14:15:07 +0000 (14:15 +0000)]
added more content on dom filter pipelines
Marc Cromme [Wed, 21 Feb 2007 13:38:22 +0000 (13:38 +0000)]
started explaining each dom filter pipeline
Marc Cromme [Wed, 21 Feb 2007 12:29:52 +0000 (12:29 +0000)]
added figure of workflow on DOM XML filter
Marc Cromme [Tue, 20 Feb 2007 15:02:18 +0000 (15:02 +0000)]
small changes to format
Marc Cromme [Tue, 20 Feb 2007 14:57:00 +0000 (14:57 +0000)]
added proper namespace in example config
Marc Cromme [Tue, 20 Feb 2007 14:53:25 +0000 (14:53 +0000)]
some more changes, more to come
Marc Cromme [Tue, 20 Feb 2007 14:28:31 +0000 (14:28 +0000)]
added initial DOM XML filter documentation. Much is missing yet ...
Adam Dickmeiss [Sun, 18 Feb 2007 21:53:22 +0000 (21:53 +0000)]
Fixed bug #898: xslt tests fails on several platforms. Problem was
that test for zs:index node crashed for absent namespace (href==NULL).
Added all .xslt-files in use in est/xslt tests.
Also fixed memory leak in use of xmlGetNodePath.
Adam Dickmeiss [Sun, 18 Feb 2007 21:50:52 +0000 (21:50 +0000)]
Fixed minor memory leak
Marc Cromme [Thu, 15 Feb 2007 15:41:16 +0000 (15:41 +0000)]
changed to respect correct index instructions in new DOM filter
Marc Cromme [Thu, 15 Feb 2007 15:08:41 +0000 (15:08 +0000)]
optimized code such that the RecWord structure recword is only
initialized once for each to-be-indexed record, and not once for each
to-be-indexed term - at the expense of a bit of pointer passing when
recursively transversing the XML DOM tree
Marc Cromme [Thu, 15 Feb 2007 14:44:48 +0000 (14:44 +0000)]
removed dead code pieces which are reminisences from the original
alvis-style parsin and indexing stuff. Now only new dom indexing code
is present.
Marc Cromme [Thu, 15 Feb 2007 14:33:41 +0000 (14:33 +0000)]
pretty formatting warning messages, always giving the file name and
the XML node path as informative parameters along
Marc Cromme [Thu, 15 Feb 2007 13:01:00 +0000 (13:01 +0000)]
rewritten mod_dom instruction parsing code hooked into mod_dom indexing
new stylesheets added, one for PI based indexing, and one for <z:index> based indexing
segmentation fault traced and fixed
test framework updated to use new mod_dom parsing
Marc Cromme [Wed, 14 Feb 2007 16:43:37 +0000 (16:43 +0000)]
added 'static' declaration to functiondefinitions
Marc Cromme [Wed, 14 Feb 2007 16:38:41 +0000 (16:38 +0000)]
changing attribute 'action' to 'type' for better confrmance with Alvis
filter syntax
Marc Cromme [Wed, 14 Feb 2007 16:31:37 +0000 (16:31 +0000)]
indenting entire file according to the rules stated in the very end of
the file, using emacs M-x indent-region, and manual line breaking afterwards
Marc Cromme [Wed, 14 Feb 2007 16:16:15 +0000 (16:16 +0000)]
continued hooking in tinfo and recctr, still need to do real indexing
Marc Cromme [Wed, 14 Feb 2007 15:42:24 +0000 (15:42 +0000)]
removed wanings by zillions of (const char *) casts and the like
Marc Cromme [Wed, 14 Feb 2007 15:23:33 +0000 (15:23 +0000)]
removed the crappy PI and <z:index> parsing code comitted yesterday
replaced with clean parsing logic developped outside mod_dom.c
needs to take care of all new warnings due to stricter compile flags
finally, needs to be hooked into actual indexing of records
Marc Cromme [Tue, 13 Feb 2007 12:19:37 +0000 (12:19 +0000)]
removed unnecessary out-commented code lines
Marc Cromme [Tue, 13 Feb 2007 11:37:02 +0000 (11:37 +0000)]
facturized DOM XML indexing code out into function
static void extract_doc_alvis(struct filter_info *tinfo,
struct recExtractCtrl *recctr,
xmlDocPtr doc)
This is the function to be re-written using both PI and <z:index> instructions,
and also fixing the bug of index type 'p' and '0' chop-over of merged content.
Marc Cromme [Mon, 12 Feb 2007 14:00:20 +0000 (14:00 +0000)]
experimental processing-instruction based indexing XSLT added
Marc Cromme [Mon, 12 Feb 2007 13:58:12 +0000 (13:58 +0000)]
avoiding unnecesasary unused namespace declarations in output documents
Marc Cromme [Mon, 12 Feb 2007 13:24:31 +0000 (13:24 +0000)]
added parsing function 'parse_pi_zebra_20' for processing-instruction parsing and 'format_pi_zebra_err' for error or wanrning formatting. Those are yet not called, and need to be build into the XML parsing in the DOM module.
Adam Dickmeiss [Mon, 12 Feb 2007 10:33:50 +0000 (10:33 +0000)]
Fixed bug #884: Entity declarations in input are lost at retrieval time.
Adam Dickmeiss [Sat, 10 Feb 2007 18:37:42 +0000 (18:37 +0000)]
Fixed serious bug in mf_open which made it fail to see an already existing
metafile. The bug was introduced in mfile 1.70.
Adam Dickmeiss [Sat, 10 Feb 2007 12:46:54 +0000 (12:46 +0000)]
buildconf.sh part of dist.
Marc Cromme [Wed, 7 Feb 2007 13:33:17 +0000 (13:33 +0000)]
corrected DEPRECIATED to DEPRECATED
Marc Cromme [Wed, 7 Feb 2007 13:19:35 +0000 (13:19 +0000)]
added debian libidzebra-2.0-mod-dom package
Marc Cromme [Wed, 7 Feb 2007 12:50:13 +0000 (12:50 +0000)]
making 'dox' target phony
Adam Dickmeiss [Wed, 7 Feb 2007 12:08:54 +0000 (12:08 +0000)]
Implemented new filter 'dom'. See test/xslt/dom-config*xml for examples.
This, like alvis, performs indexing and retrieval using XSLT. But Unlike
alvis, it allows multiple XSLT steps to be performed and does ISO2709
Adam Dickmeiss [Tue, 6 Feb 2007 09:34:56 +0000 (09:34 +0000)]
The configuration, fileverboselimit, has a value of 1000. When
reached a message is logged. Bug #845.
Adam Dickmeiss [Tue, 6 Feb 2007 09:33:31 +0000 (09:33 +0000)]
Omit sort info: bug #844.
Adam Dickmeiss [Tue, 6 Feb 2007 09:32:50 +0000 (09:32 +0000)]
More compact statistics
Marc Cromme [Mon, 5 Feb 2007 14:32:31 +0000 (14:32 +0000)]
dropped section on future directions
Marc Cromme [Mon, 5 Feb 2007 14:05:26 +0000 (14:05 +0000)]
spll checked
Marc Cromme [Mon, 5 Feb 2007 14:02:27 +0000 (14:02 +0000)]
fromatting of feature tables updated
Marc Cromme [Mon, 5 Feb 2007 13:35:12 +0000 (13:35 +0000)]
feature table updated
Marc Cromme [Fri, 2 Feb 2007 14:42:44 +0000 (14:42 +0000)]
cleaning a bit. more cleaning needed
Marc Cromme [Fri, 2 Feb 2007 14:34:20 +0000 (14:34 +0000)]
more feature info. tables still look like a grande disaster, but the content is there - more or less. needs pretty formating and tweaking
Adam Dickmeiss [Fri, 2 Feb 2007 13:48:13 +0000 (13:48 +0000)]
Fixed bug in zebrasrv: the default module path and default module path
was not set recognized.
Adam Dickmeiss [Fri, 2 Feb 2007 12:16:38 +0000 (12:16 +0000)]
Use YAZ_BIB1_SYSTEM_ERROR_IN_PRESENTING_RECORDS everywhere where
this diagnostic is returned. Put more appropriate addinfo in case
of filter load failure during retrieval.
Adam Dickmeiss [Fri, 2 Feb 2007 12:07:33 +0000 (12:07 +0000)]
Fix DEFAULT_PROFILE_PATH
Marc Cromme [Fri, 2 Feb 2007 11:10:08 +0000 (11:10 +0000)]
replaces acronymes in XML text with new defined acronyme entities
Marc Cromme [Fri, 2 Feb 2007 09:58:39 +0000 (09:58 +0000)]
added acronyme entities
Marc Cromme [Thu, 1 Feb 2007 21:26:30 +0000 (21:26 +0000)]
some more typos corrected
Marc Cromme [Thu, 1 Feb 2007 21:18:53 +0000 (21:18 +0000)]
corrected typos
Marc Cromme [Thu, 1 Feb 2007 21:08:52 +0000 (21:08 +0000)]
added Alvis 'XML'
Marc Cromme [Thu, 1 Feb 2007 21:08:12 +0000 (21:08 +0000)]
placed Alvis filter module before GRS-1 in arch chapter
Marc Cromme [Thu, 1 Feb 2007 21:04:15 +0000 (21:04 +0000)]
placing Alvis filter chapter before GRS-1 filter chapter
Marc Cromme [Thu, 1 Feb 2007 20:49:05 +0000 (20:49 +0000)]
first shot on tabulated feature overview - much needs to be done yet
Mike Taylor [Wed, 31 Jan 2007 12:26:50 +0000 (12:26 +0000)]
New
Adam Dickmeiss [Wed, 24 Jan 2007 18:00:39 +0000 (18:00 +0000)]
Bump version to 2.0.11
Adam Dickmeiss [Wed, 24 Jan 2007 16:05:25 +0000 (16:05 +0000)]
Depend on YAZ 2.1.48 or later
Adam Dickmeiss [Wed, 24 Jan 2007 15:23:58 +0000 (15:23 +0000)]
Towards 2.0.10.
Adam Dickmeiss [Mon, 22 Jan 2007 18:15:02 +0000 (18:15 +0000)]
Staticrank indexing is now an index register type defined in default.idx
via directive 'staticrank'. The 'staticrank' directive for grs is no longer
supported (was only implemented for Zebra 2.0.8).
Mike Taylor [Mon, 22 Jan 2007 11:02:12 +0000 (11:02 +0000)]
New
Adam Dickmeiss [Wed, 17 Jan 2007 15:35:47 +0000 (15:35 +0000)]
Avoid full rset count for rset_count. Proper break for result set
sort/rank.
Adam Dickmeiss [Wed, 17 Jan 2007 13:51:36 +0000 (13:51 +0000)]
Change prototype of busyhandler