From: mike Date: Thu, 14 Nov 2002 22:04:16 +0000 (+0000) Subject: - Allow keywords to be used unquoted as search terms. X-Git-Tag: v1.5~219 X-Git-Url: http://sru.miketaylor.org.uk/cgi-bin?a=commitdiff_plain;h=64483d3d31c182adfb96fd75f8be10ff9f374d34;p=cql-java-moved-to-github.git - Allow keywords to be used unquoted as search terms. - Add support for serverChoiceRelation (scr). - Add support for prefix-mapping, as in >dc="http://dublincore.org/ dc.title=fish and >"http://dublincore.org/ title=fish ### But the XCQL output may need to be changed depending on the result of the ZNG list's deliberations. - Move the README file's old "THINGS TO DO" section to the end of this file, the new "Still to do" section. --- diff --git a/Changes b/Changes index 74d7c6e..6d7f3fa 100644 --- a/Changes +++ b/Changes @@ -1,21 +1,29 @@ -$Id: Changes,v 1.7 2002-11-12 22:37:48 mike Exp $ +$Id: Changes,v 1.8 2002-11-14 22:04:16 mike Exp $ Revision history for "cql-java" +See the bottom of this file for a list of things still to do. 0.3 (IN PROGRESS) + - Allow keywords to be used unquoted as search terms. + - Add support for serverChoiceRelation (scr). + - Add support for prefix-mapping, as in + >dc="http://dublincore.org/ dc.title=fish + and + >"http://dublincore.org/ title=fish + ### But the XCQL output may need to be changed depending on + the result of the ZNG list's deliberations. + - Fix the parser to normalise relation modifiers to lower case. - Fix the CQLParser test harness not to emit an extraneous blank line at end of XCQL output. - - Fix the parser to normalise relation modifiers to lower case. - Fix CQLNode documentation to contain a link to YAZ's documentation of Prefix Query Format (PQF) rather than containing a rather unhelpful chunk of BNF. - - Change the source directory's Makefile so that it specifies - the appropriate -classpath by default. - ### undo this change! - Change the test/regression Makefile so that "make clean" now does what "make distclean" used to do - the distinction between them is pointless. - Fix a few typos in the documentation. + - Move the README file's old "THINGS TO DO" section to the end + of this file, the new "Still to do" section. 0.2 Wed Nov 6 23:05:54 2002 - Fix the order of proximity parameters in accordance with the @@ -45,3 +53,31 @@ Revision history for "cql-java" 0.1 Sun Nov 3 20:58:27 2002 - First public release. +-- + +### Still to do + - Fix the bug where "9x" is parsed as two tokens, a TT_NUMBER + followed by a TT_WORD. The problem here is that I don't + think it's actually possible to fix this without throwing + out StreakTokenizer and rolling our own, which we absolutely + _don't_ want to do. + - Write javadoc comments for CQLRelation and ModifierSet. + - Write "overview" file for the javadoc documentation. + - Some niceties for the cql-decompiling back-end: + * Don't emit redundant parentheses. + * Don't put spaces around relations that don't need them. + - Consider the utility of yet another back-end that translates + a CQLNode tree into JZKit's representation of a Type-1 query + tree. That would be nice so that CQL could become a JZKit + query-type; but you could achieve the same effect by + generating PQF, and running that through JZKit's existing + PQN-to-Type-1 compiler. + - Many refinements to the random query generator: + * Generate relation modifiers + * Proximity support + * Don't always generate qualifier/relation for terms + * Better selection of qualifier (configurable?) + * Better selection of terms (from a dictionary file?) + * Introduce wildcard characters into generated terms + * Generate multi-word terms + diff --git a/README b/README index 913a781..08a76a5 100644 --- a/README +++ b/README @@ -1,4 +1,4 @@ -$Id: README,v 1.17 2002-11-08 13:49:48 mike Exp $ +$Id: README,v 1.18 2002-11-14 22:04:16 mike Exp $ cql-java - a free CQL compiler, and other CQL tools, for Java @@ -114,35 +114,5 @@ All the other free CQL compilers everyone's going to write :-) THINGS TO DO ------------ -* ### Fix bug where "9x" is parsed as two tokens, a TT_NUMBER followed - by a TT_WORD. The problem here is that I don't think it's actually - possible to fix this without throwing out StreakTokenizer and - rolling our own, which we absolutely _don't_ want to do. - -* Allow keywords to be used unquoted as search terms. - -* Add support for serverChoiceRelation (scr). - -* Write javadoc comments for CQLRelation and ModifierSet. - -* Write "overview" file for the javadoc documentation. - -* Some niceties for the cql-decompiling back-end: - * don't emit redundant parentheses. - * don't put spaces around relations that don't need them. - -* Consider the utility of yet another back-end that translates a - CQLNode tree into a Type-1 query tree using the JZKit data - structures. That would be nice so that CQL could become a JZKit - query-type; but you could achieve the same effect by generating PQN, - and running that through JZKit's existing PQN-to-Type-1 compiler. - -* Many refinements to the random query generator: - * Generate relation modifiers - * Proximity support - * Don't always generate qualifier/relation for terms - * Better selection of qualifier (configurable?) - * Better selection of terms (from a dictionary file?) - * Introduce wildcard characters into generated terms - * Generate multi-word terms +[See the final "Still to do" section of the "Changes" file.] diff --git a/src/org/z3950/zing/cql/CQLLexer.java b/src/org/z3950/zing/cql/CQLLexer.java index 8d054d9..8ae5085 100644 --- a/src/org/z3950/zing/cql/CQLLexer.java +++ b/src/org/z3950/zing/cql/CQLLexer.java @@ -1,4 +1,4 @@ -// $Id: CQLLexer.java,v 1.4 2002-11-02 01:21:35 mike Exp $ +// $Id: CQLLexer.java,v 1.5 2002-11-14 22:04:16 mike Exp $ package org.z3950.zing.cql; import java.io.StreamTokenizer; @@ -35,6 +35,7 @@ class CQLLexer extends StreamTokenizer { static int TT_RELEVANT = 1016; // The "relevant" relation modifier static int TT_FUZZY = 1017; // The "fuzzy" relation modifier static int TT_STEM = 1018; // The "stem" relation modifier + static int TT_SCR = 1019; // The server choice relation // Support for keywords. It would be nice to compile this linear // list into a Hashtable, but it's hard to store ints as hash @@ -67,6 +68,7 @@ class CQLLexer extends StreamTokenizer { new Keyword(TT_RELEVANT, "relevant"), new Keyword(TT_FUZZY, "fuzzy"), new Keyword(TT_STEM, "stem"), + new Keyword(TT_SCR, "scr"), }; // For halfDecentPushBack() and the code at the top of nextToken() diff --git a/src/org/z3950/zing/cql/CQLParser.java b/src/org/z3950/zing/cql/CQLParser.java index 6329146..eadedef 100644 --- a/src/org/z3950/zing/cql/CQLParser.java +++ b/src/org/z3950/zing/cql/CQLParser.java @@ -1,4 +1,4 @@ -// $Id: CQLParser.java,v 1.19 2002-11-08 16:38:47 mike Exp $ +// $Id: CQLParser.java,v 1.20 2002-11-14 22:04:16 mike Exp $ package org.z3950.zing.cql; import java.io.IOException; @@ -12,7 +12,7 @@ import java.io.FileNotFoundException; /** * Compiles CQL strings into parse trees of CQLNode subtypes. * - * @version $Id: CQLParser.java,v 1.19 2002-11-08 16:38:47 mike Exp $ + * @version $Id: CQLParser.java,v 1.20 2002-11-14 22:04:16 mike Exp $ * @see http://zing.z3950.org/cql/index.html */ @@ -45,39 +45,38 @@ public class CQLParser { lexer = new CQLLexer(cql, LEXDEBUG); lexer.nextToken(); - debug("about to parse_query()"); - CQLNode root = parse_query("srw.serverChoice", new CQLRelation("scr")); - // ### "scr" above should arguably be "=" + debug("about to parseQuery()"); + CQLNode root = parseQuery("srw.serverChoice", new CQLRelation("scr")); if (lexer.ttype != lexer.TT_EOF) throw new CQLParseException("junk after end: " + lexer.render()); return root; } - private CQLNode parse_query(String qualifier, CQLRelation relation) + private CQLNode parseQuery(String qualifier, CQLRelation relation) throws CQLParseException, IOException { - debug("in parse_query()"); + debug("in parseQuery()"); - CQLNode term = parse_term(qualifier, relation); + CQLNode term = parseTerm(qualifier, relation); while (lexer.ttype != lexer.TT_EOF && lexer.ttype != ')') { if (lexer.ttype == lexer.TT_AND) { match(lexer.TT_AND); - CQLNode term2 = parse_term(qualifier, relation); + CQLNode term2 = parseTerm(qualifier, relation); term = new CQLAndNode(term, term2); } else if (lexer.ttype == lexer.TT_OR) { match(lexer.TT_OR); - CQLNode term2 = parse_term(qualifier, relation); + CQLNode term2 = parseTerm(qualifier, relation); term = new CQLOrNode(term, term2); } else if (lexer.ttype == lexer.TT_NOT) { match(lexer.TT_NOT); - CQLNode term2 = parse_term(qualifier, relation); + CQLNode term2 = parseTerm(qualifier, relation); term = new CQLNotNode(term, term2); } else if (lexer.ttype == lexer.TT_PROX) { match(lexer.TT_PROX); CQLProxNode proxnode = new CQLProxNode(term); gatherProxParameters(proxnode); - CQLNode term2 = parse_term(qualifier, relation); + CQLNode term2 = parseTerm(qualifier, relation); proxnode.addSecondSubterm(term2); term = (CQLNode) proxnode; } else { @@ -90,32 +89,25 @@ public class CQLParser { return term; } - private CQLNode parse_term(String qualifier, CQLRelation relation) + private CQLNode parseTerm(String qualifier, CQLRelation relation) throws CQLParseException, IOException { - debug("in parse_term()"); + debug("in parseTerm()"); String word; while (true) { if (lexer.ttype == '(') { debug("parenthesised term"); match('('); - CQLNode expr = parse_query(qualifier, relation); + CQLNode expr = parseQuery(qualifier, relation); match(')'); return expr; - } else if (lexer.ttype != lexer.TT_WORD && - lexer.ttype != lexer.TT_NUMBER && - lexer.ttype != '"') { - throw new CQLParseException("expected qualifier or term, " + - "got " + lexer.render()); + } else if (lexer.ttype == '>') { + match('>'); + return parsePrefix(qualifier, relation); } debug("non-parenthesised term"); - if (lexer.ttype == lexer.TT_NUMBER) { - word = lexer.render(); - } else { - word = lexer.sval; - } - match(lexer.ttype); + word = matchSymbol("qualifier or term"); if (!isBaseRelation()) break; @@ -143,6 +135,21 @@ public class CQLParser { return node; } + private CQLNode parsePrefix(String qualifier, CQLRelation relation) + throws CQLParseException, IOException { + debug("prefix mapping"); + + String name = null; + String identifier = matchSymbol("prefix-name"); + if (lexer.ttype == '=') { + match('='); + name = identifier; + identifier = matchSymbol("prefix-identifer"); + } + CQLNode term = parseTerm(qualifier, relation); + return new CQLPrefixNode(name, identifier, term); + } + private void gatherProxParameters(CQLProxNode node) throws CQLParseException, IOException { for (int i = 0; i < 4; i++) { @@ -212,7 +219,8 @@ public class CQLParser { return (isProxRelation() || lexer.ttype == lexer.TT_ANY || lexer.ttype == lexer.TT_ALL || - lexer.ttype == lexer.TT_EXACT); + lexer.ttype == lexer.TT_EXACT || + lexer.ttype == lexer.TT_SCR); } private boolean isProxRelation() { @@ -239,6 +247,43 @@ public class CQLParser { " (tmp=" + tmp + ")"); } + private String matchSymbol(String expected) + throws CQLParseException, IOException { + + debug("in matchSymbol()"); + if (lexer.ttype == lexer.TT_WORD || + lexer.ttype == lexer.TT_NUMBER || + lexer.ttype == '"' || + // The following is a complete list of keywords. Because + // they're listed here, they can be used unquoted as + // qualifiers, terms, prefix names and prefix identifiers. + lexer.ttype == lexer.TT_AND || + lexer.ttype == lexer.TT_OR || + lexer.ttype == lexer.TT_NOT || + lexer.ttype == lexer.TT_PROX || + lexer.ttype == lexer.TT_ANY || + lexer.ttype == lexer.TT_ALL || + lexer.ttype == lexer.TT_EXACT || + lexer.ttype == lexer.TT_pWORD || + lexer.ttype == lexer.TT_SENTENCE || + lexer.ttype == lexer.TT_PARAGRAPH || + lexer.ttype == lexer.TT_ELEMENT || + lexer.ttype == lexer.TT_ORDERED || + lexer.ttype == lexer.TT_UNORDERED || + lexer.ttype == lexer.TT_RELEVANT || + lexer.ttype == lexer.TT_FUZZY || + lexer.ttype == lexer.TT_STEM || + lexer.ttype == lexer.TT_SCR) { + String symbol = (lexer.ttype == lexer.TT_NUMBER) ? + lexer.render() : lexer.sval; + match(lexer.ttype); + return symbol; + } + + throw new CQLParseException("expected " + expected + ", " + + "got " + lexer.render()); + } + /** * Simple test-harness for the CQLParser class. diff --git a/src/org/z3950/zing/cql/CQLPrefix.java b/src/org/z3950/zing/cql/CQLPrefix.java new file mode 100644 index 0000000..42edfc1 --- /dev/null +++ b/src/org/z3950/zing/cql/CQLPrefix.java @@ -0,0 +1,34 @@ +// $Id: CQLPrefix.java,v 1.1 2002-11-14 22:04:16 mike Exp $ + +package org.z3950.zing.cql; +import java.lang.String; + +/** + * Represents a CQL prefix mapping from short name to long identifier. + * + * @version $Id: CQLPrefix.java,v 1.1 2002-11-14 22:04:16 mike Exp $ + */ +public class CQLPrefix { + /** + * The short name of the prefix mapping - that is, the prefix + * itself, such as dc, as it might be used in a qualifier + * like dc.title. + */ + String name; + + /** + * The full identifier name of the prefix mapping - that is, the prefix + * itself, such as dc, as it might be used in a qualifier + * like dc.title. + */ + String identifier; + + /** + * Creates a new CQLPrefix mapping, which maps the specified name + * to the specified identifier. + */ + CQLPrefix(String name, String identifier) { + this.name = name; + this.identifier = identifier; + } +} diff --git a/src/org/z3950/zing/cql/CQLPrefixNode.java b/src/org/z3950/zing/cql/CQLPrefixNode.java new file mode 100644 index 0000000..43a526c --- /dev/null +++ b/src/org/z3950/zing/cql/CQLPrefixNode.java @@ -0,0 +1,60 @@ +// $Id: CQLPrefixNode.java,v 1.1 2002-11-14 22:04:16 mike Exp $ + +package org.z3950.zing.cql; +import java.lang.String; +import java.util.Properties; + + +/** + * Represents a prefix node in a CQL parse-tree. + * + * @version $Id: CQLPrefixNode.java,v 1.1 2002-11-14 22:04:16 mike Exp $ + */ +public class CQLPrefixNode extends CQLNode { + /** + * The prefix definition that governs the subtree. + */ + public CQLPrefix prefix; + + /** + * The root of a parse-tree representing the part of the query + * that is governed by this prefix definition. + */ + public CQLNode subtree; + + /** + * Creates a new CQLPrefixNode inducing a mapping from the + * specified qualifier-set name to the specified identifier across + * the specified subtree. + */ + public CQLPrefixNode(String name, String identifier, CQLNode subtree) { + this.prefix = new CQLPrefix(name, identifier); + this.subtree = subtree; + } + + public String toXCQL(int level) { + String maybeName = ""; + if (prefix.name != null) + maybeName = indent(level+1) + "" + prefix.name + "\n"; + + return (indent(level) + "\n" + maybeName + + indent(level+1) + + "" + prefix.identifier + "\n" + + subtree.toXCQL(level+1) + + indent(level) + "\n"); + } + + public String toCQL() { + // ### We don't always need parens around the operand + return ">" + prefix.name + "=\"" + prefix.identifier + "\" " + + "(" + subtree.toCQL() + ")"; + } + + public String toPQF(Properties config) throws PQFTranslationException { + // Prefixes and their identifiers don't actually play any role + // in PQF translation, since the meanings of the qualifiers, + // including their prefixes if any, are instead wired into + // `config'. + return subtree.toPQF(config); + } +} diff --git a/src/org/z3950/zing/cql/Makefile b/src/org/z3950/zing/cql/Makefile index 4bc961e..029ca04 100644 --- a/src/org/z3950/zing/cql/Makefile +++ b/src/org/z3950/zing/cql/Makefile @@ -1,13 +1,20 @@ -# $Id: Makefile,v 1.10 2002-11-12 22:38:35 mike Exp $ +# $Id: Makefile,v 1.11 2002-11-14 22:04:16 mike Exp $ +# +# Your Java compiler, and javadoc, will require that this source +# directory is on the classpath. The best way to do that is just to +# add the cql-java distribution's "src" subdirectory to your CLASSPATH +# environment variable, like this: +# CLASSPATH=$CLASSPATH:/where/ever/you/unpacked/it/cql-java-VERSION/src DOCDIR = ../../../../../docs OBJ = Utils.class \ CQLNode.class CQLTermNode.class CQLBooleanNode.class \ CQLAndNode.class CQLOrNode.class CQLNotNode.class \ - CQLRelation.class CQLProxNode.class ModifierSet.class \ - CQLParser.class CQLLexer.class CQLParseException.class \ - CQLGenerator.class MissingParameterException.class \ + CQLProxNode.class CQLPrefixNode.class CQLPrefix.class \ + CQLRelation.class ModifierSet.class \ + CQLParser.class CQLLexer.class CQLGenerator.class \ + CQLParseException.class MissingParameterException.class \ PQFTranslationException.class \ UnknownQualifierException.class UnknownRelationException.class \ UnknownRelationModifierException.class UnknownPositionException.class @@ -15,15 +22,6 @@ OBJ = Utils.class \ ../../../../../lib/cql-java.jar: $(OBJ) cd ../../../..; jar cf ../lib/cql-java.jar org/z3950/zing/cql/*.class -# ### FIX THIS COMMENT! -# Your Java compiler will require that this source directory is on the -# classpath. Generally, you can use the rules below, which set the -# classpath suitably. But that will break if you need other elements -# in the CLASSPATH too. If that's the situation you're in, take the -# "-classpath ../../../.." flag out of the rules below, and set your -# CLASSPATH environment variable to include -# /where/ever/you/unpacked/it/cql-java-VERSION/src -# %.class: %.java javac $< diff --git a/test/regression/queries.raw b/test/regression/queries.raw index 5366fb9..67daa48 100644 --- a/test/regression/queries.raw +++ b/test/regression/queries.raw @@ -1,4 +1,5 @@ -# Simple + +# Simple cat "cat" @@ -9,6 +10,8 @@ xml:element "prox/>=/5/word" ("cat") ((dog)) +all +prox # index relation term @@ -23,6 +26,7 @@ dc.title any/stem fish dc.fish all/stem/fuzzy "fish chips" (title any frog) ((dc.title any/stem "frog pond")) +dc.title scr "fish frog chicken" # Simple Boolean @@ -31,22 +35,24 @@ cat and fish cat not frog (cat not frog) "cat" not "fish food" -xml and "prox///word/" +xml and "prox///" +fred and any +((fred or all)) a or b and c not d # I/R/T plus Boolean bath.author any fish and dc.title all "cat dog" -(title any/stem "fish dog" or "and") +(title any/stem "fish dog" or and) # Prox cat prox hat cat prox/=/3/word/ordered hat cat prox//3 hat -"fish food" prox///sentence "and" -title all "chips frog" prox//5/word "any" -(dc.author exact "jones" prox//5 title >= "smith") +"fish food" prox///sentence and +title all "chips frog" prox/>=/5 exact +(dc.author exact "jones" prox/= "smith") ((cat prox hat)) # Special characters @@ -65,22 +71,21 @@ cat?dog # Lame searches -"any" or "all:stem" and "all" exact "any" prox///word "prox"="fuzzy" -((((((((("any"))))))))) - +any or all:stem and all exact any prox prox=fuzzy +(((((((((any))))))))) +("") # Invalid searches [should error] > === cat or -index any +index any index any/wrong term a prox/wrong b () (a index any fish) (cat any dog or ()) -fred and any -((fred or all)) -sorry = (mike) +title = ("illegal parentheses") +"quoted" any "illegal quotes"