From: mike <mike>
Date: Thu, 14 Nov 2002 22:04:16 +0000 (+0000)
Subject: 	- Allow keywords to be used unquoted as search terms.
X-Git-Tag: v1.5~219
X-Git-Url: http://sru.miketaylor.org.uk/cgi-bin?a=commitdiff_plain;h=64483d3d31c182adfb96fd75f8be10ff9f374d34;p=cql-java-moved-to-github.git

	- Allow keywords to be used unquoted as search terms.
	- Add support for serverChoiceRelation (scr).
	- Add support for prefix-mapping, as in
		>dc="http://dublincore.org/ dc.title=fish
	  and
		>"http://dublincore.org/ title=fish
	  ### But the XCQL output may need to be changed depending on
	      the result of the ZNG list's deliberations.
	- Move the README file's old "THINGS TO DO" section to the end
	  of this file, the new "Still to do" section.
---

diff --git a/Changes b/Changes
index 74d7c6e..6d7f3fa 100644
--- a/Changes
+++ b/Changes
@@ -1,21 +1,29 @@
-$Id: Changes,v 1.7 2002-11-12 22:37:48 mike Exp $
+$Id: Changes,v 1.8 2002-11-14 22:04:16 mike Exp $
 
 Revision history for "cql-java"
+See the bottom of this file for a list of things still to do.
 
 0.3  (IN PROGRESS)
+	- Allow keywords to be used unquoted as search terms.
+	- Add support for serverChoiceRelation (scr).
+	- Add support for prefix-mapping, as in
+		>dc="http://dublincore.org/ dc.title=fish
+	  and
+		>"http://dublincore.org/ title=fish
+	  ### But the XCQL output may need to be changed depending on
+	      the result of the ZNG list's deliberations.
+	- Fix the parser to normalise relation modifiers to lower case.
 	- Fix the CQLParser test harness not to emit an extraneous
 	  blank line at end of XCQL output.
-	- Fix the parser to normalise relation modifiers to lower case.
 	- Fix CQLNode documentation to contain a link to YAZ's
 	  documentation of Prefix Query Format (PQF) rather than
 	  containing a rather unhelpful chunk of BNF.
-	- Change the source directory's Makefile so that it specifies
-	  the appropriate -classpath by default.
-	  ### undo this change!
 	- Change the test/regression Makefile so that "make clean" now
 	  does what "make distclean" used to do - the distinction
 	  between them is pointless.
 	- Fix a few typos in the documentation.
+	- Move the README file's old "THINGS TO DO" section to the end
+	  of this file, the new "Still to do" section.
 
 0.2  Wed Nov  6 23:05:54 2002
 	- Fix the order of proximity parameters in accordance with the
@@ -45,3 +53,31 @@ Revision history for "cql-java"
 0.1  Sun Nov  3 20:58:27 2002
 	- First public release.
 
+--
+
+### Still to do
+	- Fix the bug where "9x" is parsed as two tokens, a TT_NUMBER
+	  followed by a TT_WORD.  The problem here is that I don't
+	  think it's actually possible to fix this without throwing
+	  out StreakTokenizer and rolling our own, which we absolutely
+	  _don't_ want to do.
+	- Write javadoc comments for CQLRelation and ModifierSet.
+	- Write "overview" file for the javadoc documentation.
+	- Some niceties for the cql-decompiling back-end:
+	  * Don't emit redundant parentheses.
+	  * Don't put spaces around relations that don't need them.
+	- Consider the utility of yet another back-end that translates
+	  a CQLNode tree into JZKit's representation of a Type-1 query
+	  tree.  That would be nice so that CQL could become a JZKit
+	  query-type; but you could achieve the same effect by
+	  generating PQF, and running that through JZKit's existing
+	  PQN-to-Type-1 compiler.
+	- Many refinements to the random query generator:
+	  * Generate relation modifiers
+	  * Proximity support
+	  * Don't always generate qualifier/relation for terms
+	  * Better selection of qualifier (configurable?)
+	  * Better selection of terms (from a dictionary file?)
+	  * Introduce wildcard characters into generated terms
+	  * Generate multi-word terms
+
diff --git a/README b/README
index 913a781..08a76a5 100644
--- a/README
+++ b/README
@@ -1,4 +1,4 @@
-$Id: README,v 1.17 2002-11-08 13:49:48 mike Exp $
+$Id: README,v 1.18 2002-11-14 22:04:16 mike Exp $
 
 cql-java - a free CQL compiler, and other CQL tools, for Java
 
@@ -114,35 +114,5 @@ All the other free CQL compilers everyone's going to write  :-)
 THINGS TO DO
 ------------
 
-* ### Fix bug where "9x" is parsed as two tokens, a TT_NUMBER followed
-  by a TT_WORD.  The problem here is that I don't think it's actually
-  possible to fix this without throwing out StreakTokenizer and
-  rolling our own, which we absolutely _don't_ want to do.
-
-* Allow keywords to be used unquoted as search terms.
-
-* Add support for serverChoiceRelation (scr).
-
-* Write javadoc comments for CQLRelation and ModifierSet.
-
-* Write "overview" file for the javadoc documentation.
-
-* Some niceties for the cql-decompiling back-end:
-	* don't emit redundant parentheses.
-	* don't put spaces around relations that don't need them.
-
-* Consider the utility of yet another back-end that translates a
-  CQLNode tree into a Type-1 query tree using the JZKit data
-  structures.  That would be nice so that CQL could become a JZKit
-  query-type; but you could achieve the same effect by generating PQN,
-  and running that through JZKit's existing PQN-to-Type-1 compiler.
-
-* Many refinements to the random query generator:
-	* Generate relation modifiers
-	* Proximity support
-	* Don't always generate qualifier/relation for terms
-	* Better selection of qualifier (configurable?)
-	* Better selection of terms (from a dictionary file?)
-	* Introduce wildcard characters into generated terms
-	* Generate multi-word terms
+[See the final "Still to do" section of the "Changes" file.]
 
diff --git a/src/org/z3950/zing/cql/CQLLexer.java b/src/org/z3950/zing/cql/CQLLexer.java
index 8d054d9..8ae5085 100644
--- a/src/org/z3950/zing/cql/CQLLexer.java
+++ b/src/org/z3950/zing/cql/CQLLexer.java
@@ -1,4 +1,4 @@
-// $Id: CQLLexer.java,v 1.4 2002-11-02 01:21:35 mike Exp $
+// $Id: CQLLexer.java,v 1.5 2002-11-14 22:04:16 mike Exp $
 
 package org.z3950.zing.cql;
 import java.io.StreamTokenizer;
@@ -35,6 +35,7 @@ class CQLLexer extends StreamTokenizer {
     static int TT_RELEVANT  = 1016;	// The "relevant" relation modifier
     static int TT_FUZZY     = 1017;	// The "fuzzy" relation modifier
     static int TT_STEM      = 1018;	// The "stem" relation modifier
+    static int TT_SCR       = 1019;	// The server choice relation
 
     // Support for keywords.  It would be nice to compile this linear
     // list into a Hashtable, but it's hard to store ints as hash
@@ -67,6 +68,7 @@ class CQLLexer extends StreamTokenizer {
 	new Keyword(TT_RELEVANT, "relevant"),
 	new Keyword(TT_FUZZY, "fuzzy"),
 	new Keyword(TT_STEM, "stem"),
+	new Keyword(TT_SCR, "scr"),
     };
 
     // For halfDecentPushBack() and the code at the top of nextToken()
diff --git a/src/org/z3950/zing/cql/CQLParser.java b/src/org/z3950/zing/cql/CQLParser.java
index 6329146..eadedef 100644
--- a/src/org/z3950/zing/cql/CQLParser.java
+++ b/src/org/z3950/zing/cql/CQLParser.java
@@ -1,4 +1,4 @@
-// $Id: CQLParser.java,v 1.19 2002-11-08 16:38:47 mike Exp $
+// $Id: CQLParser.java,v 1.20 2002-11-14 22:04:16 mike Exp $
 
 package org.z3950.zing.cql;
 import java.io.IOException;
@@ -12,7 +12,7 @@ import java.io.FileNotFoundException;
 /**
  * Compiles CQL strings into parse trees of CQLNode subtypes.
  *
- * @version	$Id: CQLParser.java,v 1.19 2002-11-08 16:38:47 mike Exp $
+ * @version	$Id: CQLParser.java,v 1.20 2002-11-14 22:04:16 mike Exp $
  * @see		<A href="http://zing.z3950.org/cql/index.html"
  *		        >http://zing.z3950.org/cql/index.html</A>
  */
@@ -45,39 +45,38 @@ public class CQLParser {
 	lexer = new CQLLexer(cql, LEXDEBUG);
 
 	lexer.nextToken();
-	debug("about to parse_query()");
-	CQLNode root = parse_query("srw.serverChoice", new CQLRelation("scr"));
-	// ### "scr" above should arguably be "="
+	debug("about to parseQuery()");
+	CQLNode root = parseQuery("srw.serverChoice", new CQLRelation("scr"));
 	if (lexer.ttype != lexer.TT_EOF)
 	    throw new CQLParseException("junk after end: " + lexer.render());
 
 	return root;
     }
 
-    private CQLNode parse_query(String qualifier, CQLRelation relation)
+    private CQLNode parseQuery(String qualifier, CQLRelation relation)
 	throws CQLParseException, IOException {
-	debug("in parse_query()");
+	debug("in parseQuery()");
 
-	CQLNode term = parse_term(qualifier, relation);
+	CQLNode term = parseTerm(qualifier, relation);
 	while (lexer.ttype != lexer.TT_EOF &&
 	       lexer.ttype != ')') {
 	    if (lexer.ttype == lexer.TT_AND) {
 		match(lexer.TT_AND);
-		CQLNode term2 = parse_term(qualifier, relation);
+		CQLNode term2 = parseTerm(qualifier, relation);
 		term = new CQLAndNode(term, term2);
 	    } else if (lexer.ttype == lexer.TT_OR) {
 		match(lexer.TT_OR);
-		CQLNode term2 = parse_term(qualifier, relation);
+		CQLNode term2 = parseTerm(qualifier, relation);
 		term = new CQLOrNode(term, term2);
 	    } else if (lexer.ttype == lexer.TT_NOT) {
 		match(lexer.TT_NOT);
-		CQLNode term2 = parse_term(qualifier, relation);
+		CQLNode term2 = parseTerm(qualifier, relation);
 		term = new CQLNotNode(term, term2);
 	    } else if (lexer.ttype == lexer.TT_PROX) {
 		match(lexer.TT_PROX);
 		CQLProxNode proxnode = new CQLProxNode(term);
 		gatherProxParameters(proxnode);
-		CQLNode term2 = parse_term(qualifier, relation);
+		CQLNode term2 = parseTerm(qualifier, relation);
 		proxnode.addSecondSubterm(term2);
 		term = (CQLNode) proxnode;
 	    } else {
@@ -90,32 +89,25 @@ public class CQLParser {
 	return term;
     }
 
-    private CQLNode parse_term(String qualifier, CQLRelation relation)
+    private CQLNode parseTerm(String qualifier, CQLRelation relation)
 	throws CQLParseException, IOException {
-	debug("in parse_term()");
+	debug("in parseTerm()");
 
 	String word;
 	while (true) {
 	    if (lexer.ttype == '(') {
 		debug("parenthesised term");
 		match('(');
-		CQLNode expr = parse_query(qualifier, relation);
+		CQLNode expr = parseQuery(qualifier, relation);
 		match(')');
 		return expr;
-	    } else if (lexer.ttype != lexer.TT_WORD &&
-		       lexer.ttype != lexer.TT_NUMBER &&
-		       lexer.ttype != '"') {
-		throw new CQLParseException("expected qualifier or term, " +
-					    "got " + lexer.render());
+	    } else if (lexer.ttype == '>') {
+		match('>');
+		return parsePrefix(qualifier, relation);
 	    }
 
 	    debug("non-parenthesised term");
-	    if (lexer.ttype == lexer.TT_NUMBER) {
-		word = lexer.render();
-	    } else {
-		word = lexer.sval;
-	    }
-	    match(lexer.ttype);
+	    word = matchSymbol("qualifier or term");
 	    if (!isBaseRelation())
 		break;
 
@@ -143,6 +135,21 @@ public class CQLParser {
 	return node;
     }
 
+    private CQLNode parsePrefix(String qualifier, CQLRelation relation)
+	throws CQLParseException, IOException {
+	debug("prefix mapping");
+
+	String name = null;
+	String identifier = matchSymbol("prefix-name");
+	if (lexer.ttype == '=') {
+	    match('=');
+	    name = identifier;
+	    identifier = matchSymbol("prefix-identifer");
+	}
+	CQLNode term = parseTerm(qualifier, relation);
+	return new CQLPrefixNode(name, identifier, term);
+    }
+
     private void gatherProxParameters(CQLProxNode node)
 	throws CQLParseException, IOException {
 	for (int i = 0; i < 4; i++) {
@@ -212,7 +219,8 @@ public class CQLParser {
 	return (isProxRelation() ||
 		lexer.ttype == lexer.TT_ANY ||
 		lexer.ttype == lexer.TT_ALL ||
-		lexer.ttype == lexer.TT_EXACT);
+		lexer.ttype == lexer.TT_EXACT ||
+		lexer.ttype == lexer.TT_SCR);
     }
 
     private boolean isProxRelation() {
@@ -239,6 +247,43 @@ public class CQLParser {
 	      " (tmp=" + tmp + ")");
     }
 
+    private String matchSymbol(String expected)
+	throws CQLParseException, IOException {
+
+	debug("in matchSymbol()");
+	if (lexer.ttype == lexer.TT_WORD ||
+	    lexer.ttype == lexer.TT_NUMBER ||
+	    lexer.ttype == '"' ||
+	    // The following is a complete list of keywords.  Because
+	    // they're listed here, they can be used unquoted as
+	    // qualifiers, terms, prefix names and prefix identifiers.
+	    lexer.ttype == lexer.TT_AND ||
+	    lexer.ttype == lexer.TT_OR ||
+	    lexer.ttype == lexer.TT_NOT ||
+	    lexer.ttype == lexer.TT_PROX ||
+	    lexer.ttype == lexer.TT_ANY ||
+	    lexer.ttype == lexer.TT_ALL ||
+	    lexer.ttype == lexer.TT_EXACT ||
+	    lexer.ttype == lexer.TT_pWORD ||
+	    lexer.ttype == lexer.TT_SENTENCE ||
+	    lexer.ttype == lexer.TT_PARAGRAPH ||
+	    lexer.ttype == lexer.TT_ELEMENT ||
+	    lexer.ttype == lexer.TT_ORDERED ||
+	    lexer.ttype == lexer.TT_UNORDERED ||
+	    lexer.ttype == lexer.TT_RELEVANT ||
+	    lexer.ttype == lexer.TT_FUZZY ||
+	    lexer.ttype == lexer.TT_STEM ||
+	    lexer.ttype == lexer.TT_SCR) {
+	    String symbol = (lexer.ttype == lexer.TT_NUMBER) ?
+		lexer.render() : lexer.sval;
+	    match(lexer.ttype);
+	    return symbol;
+	}
+
+	throw new CQLParseException("expected " + expected + ", " +
+				    "got " + lexer.render());
+    }
+
 
     /**
      * Simple test-harness for the CQLParser class.
diff --git a/src/org/z3950/zing/cql/CQLPrefix.java b/src/org/z3950/zing/cql/CQLPrefix.java
new file mode 100644
index 0000000..42edfc1
--- /dev/null
+++ b/src/org/z3950/zing/cql/CQLPrefix.java
@@ -0,0 +1,34 @@
+// $Id: CQLPrefix.java,v 1.1 2002-11-14 22:04:16 mike Exp $
+
+package org.z3950.zing.cql;
+import java.lang.String;
+
+/**
+ * Represents a CQL prefix mapping from short name to long identifier.
+ *
+ * @version	$Id: CQLPrefix.java,v 1.1 2002-11-14 22:04:16 mike Exp $
+ */
+public class CQLPrefix {
+    /**
+     * The short name of the prefix mapping - that is, the prefix
+     * itself, such as <TT>dc</TT>, as it might be used in a qualifier
+     * like <TT>dc.title</TT>.
+     */
+    String name;
+
+    /**
+     * The full identifier name of the prefix mapping - that is, the prefix
+     * itself, such as <TT>dc</TT>, as it might be used in a qualifier
+     * like <TT>dc.title</TT>.
+     */
+    String identifier;
+
+    /**
+     * Creates a new CQLPrefix mapping, which maps the specified name
+     * to the specified identifier.
+     */
+    CQLPrefix(String name, String identifier) {
+	this.name = name;
+	this.identifier = identifier;
+    }
+}
diff --git a/src/org/z3950/zing/cql/CQLPrefixNode.java b/src/org/z3950/zing/cql/CQLPrefixNode.java
new file mode 100644
index 0000000..43a526c
--- /dev/null
+++ b/src/org/z3950/zing/cql/CQLPrefixNode.java
@@ -0,0 +1,60 @@
+// $Id: CQLPrefixNode.java,v 1.1 2002-11-14 22:04:16 mike Exp $
+
+package org.z3950.zing.cql;
+import java.lang.String;
+import java.util.Properties;
+
+
+/**
+ * Represents a prefix node in a CQL parse-tree.
+ *
+ * @version	$Id: CQLPrefixNode.java,v 1.1 2002-11-14 22:04:16 mike Exp $
+ */
+public class CQLPrefixNode extends CQLNode {
+    /**
+     * The prefix definition that governs the subtree.
+     */
+    public CQLPrefix prefix;
+
+    /**
+     * The root of a parse-tree representing the part of the query
+     * that is governed by this prefix definition.
+     */ 
+    public CQLNode subtree;
+
+    /**
+     * Creates a new CQLPrefixNode inducing a mapping from the
+     * specified qualifier-set name to the specified identifier across
+     * the specified subtree.
+     */
+    public CQLPrefixNode(String name, String identifier, CQLNode subtree) {
+	this.prefix = new CQLPrefix(name, identifier);
+	this.subtree = subtree;
+    }
+
+    public String toXCQL(int level) {
+	String maybeName = "";
+	if (prefix.name != null)
+	    maybeName = indent(level+1) + "<name>" + prefix.name + "<name>\n";
+
+	return (indent(level) + "<prefix>\n" + maybeName +
+		indent(level+1) +
+		    "<identifier>" + prefix.identifier + "<identifier>\n" +
+		subtree.toXCQL(level+1) +
+		indent(level) + "</prefix>\n");
+    }
+
+    public String toCQL() {
+	// ### We don't always need parens around the operand
+	return ">" + prefix.name + "=\"" + prefix.identifier + "\" " +
+	    "(" + subtree.toCQL() + ")";
+    }
+
+    public String toPQF(Properties config) throws PQFTranslationException {
+	// Prefixes and their identifiers don't actually play any role
+	// in PQF translation, since the meanings of the qualifiers,
+	// including their prefixes if any, are instead wired into
+	// `config'.
+	return subtree.toPQF(config);
+    }
+}
diff --git a/src/org/z3950/zing/cql/Makefile b/src/org/z3950/zing/cql/Makefile
index 4bc961e..029ca04 100644
--- a/src/org/z3950/zing/cql/Makefile
+++ b/src/org/z3950/zing/cql/Makefile
@@ -1,13 +1,20 @@
-# $Id: Makefile,v 1.10 2002-11-12 22:38:35 mike Exp $
+# $Id: Makefile,v 1.11 2002-11-14 22:04:16 mike Exp $
+#
+# Your Java compiler, and javadoc, will require that this source
+# directory is on the classpath.  The best way to do that is just to
+# add the cql-java distribution's "src" subdirectory to your CLASSPATH
+# environment variable, like this:
+#	CLASSPATH=$CLASSPATH:/where/ever/you/unpacked/it/cql-java-VERSION/src
 
 DOCDIR = ../../../../../docs
 
 OBJ = Utils.class \
 	CQLNode.class CQLTermNode.class CQLBooleanNode.class \
 	CQLAndNode.class CQLOrNode.class CQLNotNode.class \
-	CQLRelation.class CQLProxNode.class ModifierSet.class \
-	CQLParser.class CQLLexer.class CQLParseException.class \
-	CQLGenerator.class MissingParameterException.class \
+	CQLProxNode.class CQLPrefixNode.class CQLPrefix.class \
+	CQLRelation.class ModifierSet.class \
+	CQLParser.class CQLLexer.class CQLGenerator.class \
+	CQLParseException.class MissingParameterException.class \
 	PQFTranslationException.class \
 	UnknownQualifierException.class UnknownRelationException.class \
 	UnknownRelationModifierException.class UnknownPositionException.class
@@ -15,15 +22,6 @@ OBJ = Utils.class \
 ../../../../../lib/cql-java.jar: $(OBJ)
 	cd ../../../..; jar cf ../lib/cql-java.jar org/z3950/zing/cql/*.class
 
-# ### FIX THIS COMMENT!
-# Your Java compiler will require that this source directory is on the
-# classpath.  Generally, you can use the rules below, which set the
-# classpath suitably.  But that will break if you need other elements
-# in the CLASSPATH too.  If that's the situation you're in, take the
-# "-classpath ../../../.." flag out of the rules below, and set your
-# CLASSPATH environment variable to include
-#	/where/ever/you/unpacked/it/cql-java-VERSION/src
-#
 %.class: %.java
 	javac $<
 
diff --git a/test/regression/queries.raw b/test/regression/queries.raw
index 5366fb9..67daa48 100644
--- a/test/regression/queries.raw
+++ b/test/regression/queries.raw
@@ -1,4 +1,5 @@
-# Simple
+
+# Simple 
 
 cat
 "cat"
@@ -9,6 +10,8 @@ xml:element
 "prox/>=/5/word"
 ("cat")
 ((dog))
+all
+prox
 
 # index relation term
 
@@ -23,6 +26,7 @@ dc.title any/stem fish
 dc.fish all/stem/fuzzy "fish chips"
 (title any frog)
 ((dc.title any/stem "frog pond"))
+dc.title scr "fish frog chicken"
 
 # Simple Boolean
 
@@ -31,22 +35,24 @@ cat and fish
 cat not frog
 (cat not frog)
 "cat" not "fish food"
-xml and "prox///word/"
+xml and "prox///"
+fred and any
+((fred or all))
 a or b and c not d
 
 # I/R/T plus Boolean
 
 bath.author any fish and dc.title all "cat dog"
-(title any/stem "fish dog" or "and")
+(title any/stem "fish dog" or and)
 
 # Prox
 
 cat prox hat
 cat prox/=/3/word/ordered hat
 cat prox//3 hat
-"fish food" prox///sentence "and"
-title all "chips frog" prox//5/word "any"
-(dc.author exact "jones" prox//5 title >= "smith")
+"fish food" prox///sentence and
+title all "chips frog" prox/>=/5 exact
+(dc.author exact "jones" prox/</5/element title >= "smith")
 ((cat prox hat))
 
 # Special characters
@@ -65,22 +71,21 @@ cat?dog
 
 # Lame searches
 
-"any" or "all:stem" and "all" exact "any" prox///word "prox"="fuzzy"
-((((((((("any")))))))))
-
+any or all:stem and all exact any prox prox=fuzzy
+(((((((((any)))))))))
+("")
 
 # Invalid searches [should error]
 
 >
 ===
 cat or
-index any
+index any 
 index any/wrong term
 a prox/wrong b
 ()
 (a
 index any fish)
 (cat any dog or ())
-fred and any
-((fred or all))
-sorry = (mike)
+title = ("illegal parentheses")
+"quoted" any "illegal quotes"