Also, add the random testing script.
-$Id: README,v 1.9 2002-11-02 01:24:41 mike Exp $
+$Id: README,v 1.10 2002-11-03 16:49:37 mike Exp $
-cql-java -- a free CQL compiler for Java
+cql-java - a free CQL compiler, and other CQL tools, for Java
-This project provides a set of classes for representing a CQL parse
-tree (CQLBooleanNode, CQLTermNode, etc.) and a CQLCompiler class which
-builds a parse tree given a CQL query as input. It also provides
-compiler back-ends to render out the parse tree as XCQL (the XML
-representation), as PQF (Yaz-style Prefix Query Format) and as CQL
-(i.e. decompiling the parse-tree). Oh, and there's a random query
-generator, too.
+INTRODUCTION
+------------
+
+cql-java is a Free Software project that provides:
+
+* A set of classes for representing a CQL parse tree (a base CQLNode
+ class, CQLBooleanNode and its subclasses, CQLTermNode, etc.)
+* A CQLCompiler class (and its lexer) which builds a parse tree given
+ a CQL query as input.
+* A selection of compiler back-ends to render out the parse tree as:
+ * XCQL (the standard XML representation)
+ * CQL (i.e. decompiling the parse-tree)
+ * PQF (Yaz-style Prefix Query Format) [### NOT YET]
+* A random query generator, useful for testing.
CQL is "Common Query Language", a new query language designed under
the umbrella of the ZING initiative (Z39.59-International Next
But if you didn't know that, why are you even reading this? :-)
+What's what in this distribution?
+
+ README This file
+ src Source-code for the cql-java library
+ lib The compiled library file, "cql-java.jar"
+ bin Simple shell-scripts to invoke the test-harnesses
+ docs Documentation automatically generated by "javadoc"
+ test Various testing and sanity-checking frameworks
+ etc Other files: CQL Grammar, generator properties, etc.
+
+"Installation" of this package would consist of putting the bin
+directory on your PATH and the lib directory on your CLASSPATH.
+
+
SYNOPSIS
--------
-Test-harness:
+Using the test-harnesses:
- $ echo "foo and (bar or baz)" | java org.z3950.zing.cql.CQLParser
+ $ CQLParser 'title=foo and author=(bar or baz)'
+ $ CQLLexer 'title=foo and author=(bar or baz)'
+ (not very interesting unless you're debugging)
+ $ CQLGenerator etc/generate.properties seed 18
-Library:
+Using the library in your own applications:
import org.z3950.zing.cql.*
// Building a parse-tree by hand
- CQLNode n1 = new CQLTermNode("dc.author", "=", "kernighan");
- CQLNode n2 = new CQLTermNode("dc.title", "all", "elements style");
+ CQLNode n1 = new CQLTermNode("dc.author", new CQLRelation("="),
+ "kernighan");
+ CQLNode n2 = new CQLTermNode("dc.title", new CQLRelation("all"),
+ "elements style");
CQLNode root = new CQLAndNode(n1, n2);
- System.out.println(root.toXCQL(3));
+ System.out.println(root.toXCQL(0));
// Parsing a CQL query
CQLParser parser = new CQLParser();
AUTHOR
------
-Mike Taylor <mike@z3950.org>
-http://www.miketaylor.org.uk
+All code and documentation by Mike Taylor <mike@z3950.org>
+ http://www.miketaylor.org.uk
+Please email me with bug-reports, wishlist items, patches, deployment
+stories and, of course, large cash donations.
LICENCE
-------
-This software is open source, but I've not yet decided exactly what
+This software is Open Source, but I've not yet decided exactly what
licence to use. Be good. Assume I'm going with the GPL (most
-restrictive) until I say otherwise.
-
-
-TESTING
--------
-
-Ways of testing the parser and other components include:
-
-* Generate a random tree with CQLGenerate, take a copy, and
- canonicalise it with CQLparser -c. Since the CQLGenerate output is
- in canonical form anyway, the before-and-after versions should be
- identical.
-
-* ... others :-)
+restrictive) until I say otherwise. For what it's worth, I think the
+most likely licence is the LGPL (GNU's Lesser General Public Licence)
+which lets you deploy cql-java as a part of a non-free larger work.
SEE ALSO
All the other free CQL compilers everyone's going to write :-)
-TO DO
------
+THINGS TO DO
+------------
-* ### Fix bug where "9x" is parsed as two tokens, a NUMBER and a
- WORD. (And why is "x9" OK?)
-
-* Allow CQLGenerate test-harness to take some of its configuration
- parameters on the command-line as well as or instead of a file.
+* ### Fix bug where "9x" is parsed as two tokens, a TT_NUMBER followed
+ by a TT_WORD. The problem here is that I don't think it's actually
+ possible to fix this without throwing out StreakTokenizer and
+ rolling our own, which we absolutely _don't_ want to do.
* Some niceties for the cql-decompiling back-end:
* don't emit redundant parentheses.
* don't put spaces around relations that don't need them.
-* Write pqn-generating back-end (will need to be driven from a
- configuation file specifying how to represent the qualifiers,
+* Write the PQN-generating back-end. This will need to be driven from
+ a configuation file specifying how to represent the qualifiers,
relations, relation modifiers and wildcard characters as z39.50
- attributes.)
+ attributes. I think Ray has such a thing, though perhaps not yet in
+ a form sufficiently rigorous to be computer-readable.
* Consider the utility of yet another back-end that translates a
CQLNode tree into a Type-1 query tree using the JZKit data
query-type; but you could achieve the same effect by generating PQN,
and running that through JZKit's existing PQN-to-Type-1 compiler.
-* Refinements to random query generator:
+* Many refinements to the random query generator:
* Generate relation modifiers
* Proximity support
* Don't always generate qualifier/relation for terms
* Generate multi-word terms
* Write fuller "javadoc" comments.
-
-* Write generic test suite.
-
+$Id: README,v 1.2 2002-11-03 16:49:38 mike Exp $
+
Automatically-generated documentation should appear here.
cd ../src/org/z3950/zing/cql && make javadocs
-# $Id: generate.properties,v 1.1 2002-10-30 09:19:26 mike Exp $
+# $Id: generate.properties,v 1.2 2002-11-03 16:49:38 mike Exp $
#
# Propeties file to drive the org.z3950.zing.cql.CQLGenerator
# test-harness. See that class's documentation for the semantics of
# these properties.
#
-#seed=18398
complexQuery=0.4
complexClause=0.4
equalsRelation=0.5
--- /dev/null
+cql-java.jar
--- /dev/null
+$Id: README,v 1.1 2002-11-03 16:49:38 mike Exp $
+
+The library file "cql-java.jar" will appear here when you do a build
+in ../src/org/z3950/zing/cql. Put it on your CLASSPATH to use the
+cql-java utilities.
-// $Id: CQLGenerator.java,v 1.2 2002-10-30 11:13:18 mike Exp $
+// $Id: CQLGenerator.java,v 1.3 2002-11-03 16:49:38 mike Exp $
package org.z3950.zing.cql;
import java.util.Properties;
* this distribution - there is a <TT>generate_<I>x</I>()</TT> method
* for each grammar element <I>X</I>.
*
- * @version $Id: CQLGenerator.java,v 1.2 2002-10-30 11:13:18 mike Exp $
+ * @version $Id: CQLGenerator.java,v 1.3 2002-11-03 16:49:38 mike Exp $
* @see <A href="http://zing.z3950.org/cql/index.html"
* >http://zing.z3950.org/cql/index.html</A>
*/
* A simple test-harness for the generator.
* <P>
* It generates a single random query using the parameters
- * specified in a nominated properties file, and decompiles it
- * into CQL which is written to standard output.
+ * specified in a nominated properties file, plus any additional
+ * <I>name value</I> pairs provided on the command-line, and
+ * decompiles it into CQL which is written to standard output.
* <P>
* For example,
- * <TT>java org.z3950.zing.cql.CQLGenerator etc/generate.properties</TT>
+ * <TT>java org.z3950.zing.cql.CQLGenerator
+ * etc/generate.properties seed 18398</TT>,
* where the file <TT>generate.properties</TT> contains:<PRE>
- * seed=18398
* complexQuery=0.4
* complexClause=0.4
* equalsRelation=0.5
* @param configFile
* The name of a properties file from which to read the
* configuration parameters (see above).
+ * @param name
+ * The name of a configuration parameter.
+ * @param value
+ * The value to assign to the configuration parameter named in
+ * the immediately preceding command-line argument.
* @return
* A CQL query expressed in a form that should be comprehensible
* to all conformant CQL compilers.
*/
public static void main (String[] args) throws Exception {
- if (args.length != 1) {
- System.err.println("Usage: CQLGenerator <props-file>");
+ if (args.length % 2 != 1) {
+ System.err.println("Usage: CQLGenerator <props-file> "+
+ "[<name> <value>]...");
System.exit(1);
}
Properties params = new Properties();
params.load(f);
f.close();
+ for (int i = 1; i < args.length; i += 2)
+ params.setProperty(args[i], args[i+1]);
CQLGenerator generator = new CQLGenerator(params);
CQLNode tree = generator.generate();
-// $Id: CQLParser.java,v 1.13 2002-11-02 01:24:14 mike Exp $
+// $Id: CQLParser.java,v 1.14 2002-11-03 16:49:38 mike Exp $
package org.z3950.zing.cql;
import java.io.IOException;
/**
- * Compiles a CQL string into a parse tree.
- * ##
+ * Compiles CQL strings into parse trees of CQLNode subtypes.
*
- * @version $Id: CQLParser.java,v 1.13 2002-11-02 01:24:14 mike Exp $
+ * @version $Id: CQLParser.java,v 1.14 2002-11-03 16:49:38 mike Exp $
* @see <A href="http://zing.z3950.org/cql/index.html"
* >http://zing.z3950.org/cql/index.html</A>
*/
System.err.println("PARSEDEBUG: " + str);
}
+ /**
+ * Compiles a CQL query.
+ * <P>
+ * The resulting parse tree may be further processed by hand (see
+ * the individual node-types' documentation for details on the
+ * data structure) or, more often, simply rendered out in the
+ * desired form using one of the back-ends. <TT>toCQL()</TT>
+ * returns a decompiled CQL query equivalent to the one that was
+ * compiled in the first place; and <TT>toXCQL()</TT> returns an
+ * XML snippet representing the query.
+ *
+ * @param cql The query
+ * @return A CQLNode object which is the root of a parse
+ * tree representing the query. */
public CQLNode parse(String cql)
throws CQLParseException, IOException {
lexer = new CQLLexer(cql, LEXDEBUG);
match(lexer.ttype);
}
- boolean isBaseRelation() {
+ private boolean isBaseRelation() {
debug("isBaseRelation: checking ttype=" + lexer.ttype +
" (" + lexer.render() + ")");
return (isProxRelation() ||
lexer.ttype == lexer.TT_EXACT);
}
- boolean isProxRelation() {
+ private boolean isProxRelation() {
debug("isProxRelation: checking ttype=" + lexer.ttype +
" (" + lexer.render() + ")");
return (lexer.ttype == '<' ||
}
- // Test harness.
- //
- // e.g. echo '(au=Kerninghan or au=Ritchie) and ti=Unix' |
- // java org.z3950.zing.cql.CQLParser
- // yields:
- // <triple>
- // <boolean>and</boolean>
- // <triple>
- // <boolean>or</boolean>
- // <searchClause>
- // <index>au<index>
- // <relation>=<relation>
- // <term>Kerninghan<term>
- // </searchClause>
- // <searchClause>
- // <index>au<index>
- // <relation>=<relation>
- // <term>Ritchie<term>
- // </searchClause>
- // </triple>
- // <searchClause>
- // <index>ti<index>
- // <relation>=<relation>
- // <term>Unix<term>
- // </searchClause>
- // </triple>
- //
+ /**
+ * Simple test-harness for the CQLParser class.
+ * <P>
+ * Reads a CQL query either from its command-line argument, if
+ * there is one, or standard input otherwise. So these two
+ * invocations are equivalent:
+ * <PRE>
+ * CQLParser 'au=(Kerninghan or Ritchie) and ti=Unix'
+ * echo au=(Kerninghan or Ritchie) and ti=Unix | CQLParser
+ * </PRE>
+ * The test-harness parses the supplied query and renders is as
+ * XCQL, so that both of the invocations above produce the
+ * following output:
+ * <PRE>
+ * <triple>
+ * <boolean>
+ * <value>and</value>
+ * </boolean>
+ * <triple>
+ * <boolean>
+ * <value>or</value>
+ * </boolean>
+ * <searchClause>
+ * <index>au</index>
+ * <relation>
+ * <value>=</value>
+ * </relation>
+ * <term>Kerninghan</term>
+ * </searchClause>
+ * <searchClause>
+ * <index>au</index>
+ * <relation>
+ * <value>=</value>
+ * </relation>
+ * <term>Ritchie</term>
+ * </searchClause>
+ * </triple>
+ * <searchClause>
+ * <index>ti</index>
+ * <relation>
+ * <value>=</value>
+ * </relation>
+ * <term>Unix</term>
+ * </searchClause>
+ * </triple>
+ * </PRE>
+ * <P>
+ * @param -c
+ * Causes the output to be written in CQL rather than XCQL - that
+ * is, a query equivalent to that which was input, is output. In
+ * effect, the test harness acts as a query canonicaliser.
+ * @return
+ * The input query, either as XCQL [default] or CQL [if the
+ * <TT>-c</TT> option is supplied].
+ */
public static void main (String[] args) {
boolean canonicalise = false;
Vector argv = new Vector();
-# $Id: Makefile,v 1.3 2002-10-31 22:22:01 mike Exp $
+# $Id: Makefile,v 1.4 2002-11-03 16:49:38 mike Exp $
-all: Utils.class \
+OBJ = Utils.class \
CQLNode.class CQLTermNode.class CQLBooleanNode.class \
CQLAndNode.class CQLOrNode.class CQLNotNode.class \
CQLRelation.class CQLProxNode.class ModifierSet.class \
CQLParser.class CQLLexer.class CQLParseException.class \
CQLGenerator.class ParameterMissingException.class
-docs:
+../../../../../lib/cql-java.jar: $(OBJ)
+ cd ../../../..; jar cf ../lib/cql-java.jar org/z3950/zing/cql/*.class
+
+docs: ../../../../../docs/overview-tree.html
+
+../../../../../docs/overview-tree.html: *.java
nice javadoc -d ../../../../../docs -author -version \
-windowtitle cql-java org.z3950.zing.cql
javac $<
clean:
- rm -f *.class
+ rm -f $(CLASS)
cleandocs:
rm -r docs/*
-# $Id: Makefile,v 1.2 2002-11-02 01:19:23 mike Exp $
-
-tests: sections/01/01.xcql
- ./runtests CQLParser cat
+# $Id: Makefile,v 1.3 2002-11-03 16:49:38 mike Exp $
sections/01/01.xcql: sections
- ./mkanswers ../../srw/cql/cqlparse3
+ ./mkanswers CQLParser
+# OR ./mkanswers ../../srw/cql/cqlparse3
# OR ./mkanswers ../../rob/CQLParser.py
-sections: mktests raw
+sections: mktests queries.raw
rm -rf sections
- ./mktests raw
+ ./mktests queries.raw
+
+adam-tests: sections/01/01.xcql
+ ./runtests ../../srw/cql/cqlparse3
+
+rob-tests: sections/01/01.xcql
+ ./runtests ../../rob/CQLParser.py
clean:
find sections -name '*.xcql' -print | xargs rm -f
-$Id: README,v 1.2 2002-11-02 01:19:23 mike Exp $
+$Id: README,v 1.3 2002-11-03 16:49:38 mike Exp $
-"raw" is the file of test queries as provided by Rob.
+"queries.raw" is the file of test queries as provided by Rob.
"mktests" parses the raw file into sections and individual queries
"sections" is the top-level directory created by that program.
"01", "02" etc. represent the sections within the raw file
against its results, do this:
rm -rf sections
- ./mktests raw
+ ./mktests queries.raw
./mkanswers CQLParser.py
./runtests CQLParser sgmlnorm
(Except that sgmlnorm is useless -- gotta find something better.)
-Also: there's a nasty hacl here called "showtest" which, when run like
+Also: there's a nasty hack here called "showtest" which, when run like
``./showtest 07/03'', will show you the ways in which my output
differs from Adam's. I'll probably delete it soon.
+
+Also: there's a subdirectory "random" which tests in a completely
+different way. That ought to be a sister directory with this one, and
+will be when I move the rest of this stuff down a level.
--- /dev/null
+# Simple
+
+cat
+"cat"
+comp.os.linux
+xml:element
+"<xml:element>"
+"="
+"prox/word/>=/5"
+("cat")
+((dog))
+
+# index relation term
+
+title = "fish"
+title exact fish
+title any fish
+title all fish
+title > 9
+title >= 23
+dc.title any "fish chips"
+dc.title any/stem fish
+dc.fish all/stem/fuzzy "fish chips"
+(title any frog)
+((dc.title any/stem "frog pond"))
+
+# Simple Boolean
+
+cat or dog
+cat and fish
+cat not frog
+(cat not frog)
+"cat" not "fish food"
+xml and "prox/word/"
+a or b and c not d
+
+# I/R/T plus Boolean
+
+bath.author any fish and dc.title all "cat dog"
+(title any/stem "fish dog" or "and")
+
+# Prox
+
+cat prox hat
+cat prox/word/=/3/ordered hat
+cat prox///3 hat
+"fish food" prox/sentence "and"
+title all "chips frog" prox/word//5 "any"
+(dc.author exact "jones" prox///5 title >= "smith")
+((cat prox hat))
+
+# Special characters
+(cat^)
+"cat"
+"^cat says \"fish\""
+"cat*fish"
+cat?dog
+(("^cat*fishdog\"horse?"))
+
+# Nesting Parens
+
+(((cat or dog) or horse) and frog)
+(cat and dog) or (horse and frog)
+(cat and (horse or frog)) and chips
+
+# Lame searches
+
+"any" or "all:stem" and "all" exact "any" prox/word "prox"="fuzzy"
+((((((((("any")))))))))
+
+
+# Invalid searches [should error]
+
+>
+===
+cat or
+index any
+index any/wrong term
+a prox/wrong b
+()
+(a
+index any fish)
+(cat any dog or ())
+fred and any
+((fred or all))
+sorry = (mike)
--- /dev/null
+$Id: README,v 1.1 2002-11-03 16:49:38 mike Exp $
+
+In this directory, we test the integrity of the cql-java tools as
+follows:
+
+* Generate a random tree with CQLGenerate
+* Take a copy
+* Canonicalise it with CQLparser -c.
+* Compare the before-and-after versions.
+
+ Since the CQLGenerate output is in canonical form anyway, the
+ before-and-after versions should be identical. This process
+ exercises the comprehensiveness and bullet-proofing of the parser,
+ as well as the accuracy of the rendering.
+
--- /dev/null
+#!/usr/bin/perl -w
+
+use strict;
+
+my $n = 1;
+if (@ARGV > 1) {
+ print STDERR "Usage: $0 [<number-of-trees>]\n";
+ exit 1;
+} elsif (@ARGV == 1) {
+ $n = $ARGV[0];
+}
+
+for (my $i = 0; $i < $n; $i++) {
+ print $i+1, " of $n -- ";
+ my $query=`CQLGenerator ../../etc/generate.properties`;
+ print $query;
+ my $canon=`CQLParser -c '$query'`;
+ if ($canon ne $query) {
+ print "ERROR: canonicalised query differs from original\n";
+ }
+}
+++ /dev/null
-# Simple
-
-cat
-"cat"
-comp.os.linux
-xml:element
-"<xml:element>"
-"="
-"prox/word/>=/5"
-("cat")
-((dog))
-
-# index relation term
-
-title = "fish"
-title exact fish
-title any fish
-title all fish
-title > 9
-title >= 23
-dc.title any "fish chips"
-dc.title any/stem fish
-dc.fish all/stem/fuzzy "fish chips"
-(title any frog)
-((dc.title any/stem "frog pond"))
-
-# Simple Boolean
-
-cat or dog
-cat and fish
-cat not frog
-(cat not frog)
-"cat" not "fish food"
-xml and "prox/word/"
-a or b and c not d
-
-# I/R/T plus Boolean
-
-bath.author any fish and dc.title all "cat dog"
-(title any/stem "fish dog" or "and")
-
-# Prox
-
-cat prox hat
-cat prox/word/=/3/ordered hat
-cat prox///3 hat
-"fish food" prox/sentence "and"
-title all "chips frog" prox/word//5 "any"
-(dc.author exact "jones" prox///5 title >= "smith")
-((cat prox hat))
-
-# Special characters
-(cat^)
-"cat"
-"^cat says \"fish\""
-"cat*fish"
-cat?dog
-(("^cat*fishdog\"horse?"))
-
-# Nesting Parens
-
-(((cat or dog) or horse) and frog)
-(cat and dog) or (horse and frog)
-(cat and (horse or frog)) and chips
-
-# Lame searches
-
-any or all:stem and all exact any prox/word prox=fuzzy
-(((((((((any)))))))))
-
-
-# Invalid searches [should error]
-
-^
->
-===
-cat or
-index any
-index any/wrong term
-a prox/wrong b
-()
-(a
-index any fish)
-(cat any dog or ())
-sorry = (mike)
-fred and any
-((fred or all))