Index: contrib/queries/src/java/org/apache/lucene/search/trie/package.html =================================================================== --- contrib/queries/src/java/org/apache/lucene/search/trie/package.html (revision 0) +++ contrib/queries/src/java/org/apache/lucene/search/trie/package.html (revision 0) @@ -0,0 +1,93 @@ + + +

This package provides fast numeric range queries/filters on long, double or Date +fields based on trie structures.

+ +

How it works

+

See the publication about panFMP, where this algorithm was described: + +

Schindler, U, Diepenbroek, M, 2008. Generic XML-based Framework for Metadata Portals. +Computers & Geosciences 34 (12), 1947-1955. +doi:10.1016/j.cageo.2008.02.023
+ +

A quote from this paper: Because Apache Lucene is a full-text search engine and not a conventional database, +it cannot handle numerical ranges (e.g., field value is inside user defined bounds, even dates are numerical values). +We have developed an extension to Apache Lucene that stores +the numerical values in a special string-encoded format with variable precision +(all numerical values like doubles, longs, and timestamps are converted to lexicographic sortable string representations +and stored with different precisions from one byte to the full 8 bytes - depending on the variant used). +For a more detailed description, how the values are stored, see {@link org.apache.lucene.search.trie.TrieUtils}. +A range is then divided recursively into multiple intervals for searching: +The center of the range is searched only with the lowest possible precision in the trie, the boundaries are matched +more exactly. This reduces the number of terms dramatically.

+ +

For the variant, that uses a lowest precision of 1-byte the index only +contains a maximum of 256 distinct values in the lowest precision. +Overall, a range could consist of a theoretical maximum of +7*255*2 + 255 = 3825 distinct terms (when there is a term for every distinct value of an +8-byte-number in the index and the range covers all of them; a maximum of 255 distinct values is used +because it would always be possible to reduce the full 256 values to one term with degraded precision). +In practise, we have seen up to 300 terms in most cases (index with 500,000 metadata records +and a homogeneous dispersion of values).

+ +

There are two other variants of encoding: 4bit and 2bit. Each variant stores more different precisions +of the longs and so needs more storage space (because it generates more and longer terms - +4bit: two times the length and number of terms; 2bit: four times the length and number of terms). +But on the other hand, the maximum number of distinct terms used for range queries is +15*15*2 + 15 = 465 for the 4bit variant, and +31*3*2 + 3 = 189 for the 2bit variant.

+ +

This dramatically improves the performance of Apache Lucene with range queries, which +is no longer dependent on the index size and number of distinct values because there is +an upper limit not related to any of these properties.

+ +

Usage

+

To use the new query types the numerical values, which may be long, double or Date, +during indexing the values must be stored in a special format in index (using {@link org.apache.lucene.search.trie.TrieUtils}). +This can be done like this:

+ +
+	Document doc=new Document();
+	// add some standard fields:
+	String svalue="anything to index";
+	doc.add(new Field("exampleString",
+		svalue, Field.Store.YES, Field.Index.ANALYZED) ;
+	// add some numerical fields:
+	double fvalue=1.057E17;
+	TrieUtils.VARIANT_8BIT.addDoubleTrieCodedDocumentField(doc, "exampleDouble", 
+		fvalue, true /* index the field */, Field.Store.YES);
+	double lvalue=121345;
+	TrieUtils.VARIANT_8BIT.addLongTrieCodedDocumentField(doc, "exampleLong",
+		lvalue, true /* index the field */, Field.Store.YES);
+	Date dvalue=new Date(); // actual time
+	TrieUtils.VARIANT_8BIT.addDateTrieCodedDocumentField(doc, "exampleDate", 
+		dvalue, true /* index the field */, Field.Store.YES);
+	// add document to IndexWriter
+
+ +

The numeric index fields you prepared in this way can be searched by {@link org.apache.lucene.search.trie.TrieRangeQuery}:

+ +
+	Query q=new TrieRangeQuery("exampleDouble", new Double(1.0E17), new Double(2.0E17), TrieUtils.VARIANT_8BIT);
+	TopDocs docs=searcher.search(q, 10);
+	for (int i=0; i<docs.scoreDocs.length; i++) {
+		Document doc=searcher.doc(docs.scoreDocs[i].doc);
+		System.out.println(doc.get("exampleString"));
+		// decode the stored numerical value (important!!!):
+		System.out.println( TrieUtils.VARIANT_8BIT.trieCodedToDouble(doc.get("exampleDouble")) );
+	}
+
+ +

Performance

+ +

Comparisions of the different types of RangeQueries on an index with about 500,000 docs showed, +that the old {@link org.apache.lucene.search.RangeQuery} (with raised +{@link org.apache.lucene.search.BooleanQuery} clause count) took about 30-40 secs to complete, +{@link org.apache.lucene.search.ConstantScoreRangeQuery} took 5 secs and +{@link org.apache.lucene.search.trie.TrieRangeQuery} took <100ms to +complete (on an Opteron64 machine, Java 1.5). +This query type was developed for a geographic portal, where the performance for +e.g. bounding boxes or exact date/time stamps is important.

+ + + \ No newline at end of file Index: contrib/queries/src/java/org/apache/lucene/search/trie/TrieRangeFilter.java =================================================================== --- contrib/queries/src/java/org/apache/lucene/search/trie/TrieRangeFilter.java (revision 0) +++ contrib/queries/src/java/org/apache/lucene/search/trie/TrieRangeFilter.java (revision 0) @@ -0,0 +1,276 @@ +package org.apache.lucene.search.trie; + +/** + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +import java.io.IOException; +import java.util.Date; + +import org.apache.lucene.search.Filter; +import org.apache.lucene.search.DocIdSet; +import org.apache.lucene.index.IndexReader; +import org.apache.lucene.index.TermDocs; +import org.apache.lucene.index.TermEnum; +import org.apache.lucene.index.Term; +import org.apache.lucene.util.OpenBitSet; + +/** + * Implementation of a Lucene {@link Filter} that implements trie-based range filtering. + * This filter depends on a specific structure of terms in the index that can only be created + * by {@link TrieUtils} methods. + * For more information, how the algorithm works, see the package description {@link org.apache.lucene.search.trie}. + * @author Uwe Schindler (panFMP developer) + */ +public final class TrieRangeFilter extends Filter { + + /** Generic constructor (internal use only): Uses already trie-converted min/max values */ + public TrieRangeFilter(final String field, final String min, final String max, final TrieUtils variant) { + if (min==null && max==null) throw new IllegalArgumentException("The min and max values cannot be both null."); + this.trieVariant=variant; + this.minUnconverted=min; + this.maxUnconverted=max; + this.min=(min==null) ? trieVariant.TRIE_CODED_NUMERIC_MIN : min; + this.max=(max==null) ? trieVariant.TRIE_CODED_NUMERIC_MAX : max; + this.field=field.intern(); + } + + /** Generic constructor (internal use only): Uses already trie-converted min/max values */ + public TrieRangeFilter(final String field, final String min, final String max) { + this(field,min,max,getDefaultTrieVariant()); + } + + /** + * Generates a trie query using the supplied field with range bounds in numeric form (double). + * You can set min or max (but not both) to null to leave one bound open. + */ + public TrieRangeFilter(final String field, final Double min, final Double max, final TrieUtils variant) { + this( + field, + (min==null) ? null : variant.doubleToTrieCoded(min.doubleValue()), + (max==null) ? null : variant.doubleToTrieCoded(max.doubleValue()), + variant + ); + this.minUnconverted=min; + this.maxUnconverted=max; + } + + /** + * Generates a trie query using the supplied field with range bounds in numeric form (double). + * You can set min or max (but not both) to null to leave one bound open. + */ + public TrieRangeFilter(final String field, final Double min, final Double max) { + this(field,min,max,getDefaultTrieVariant()); + } + + /** + * Generates a trie query using the supplied field with range bounds in date/time form. + * You can set min or max (but not both) to null to leave one bound open. + */ + public TrieRangeFilter(final String field, final Date min, final Date max, final TrieUtils variant) { + this( + field, + (min==null) ? null : variant.dateToTrieCoded(min), + (max==null) ? null : variant.dateToTrieCoded(max), + variant + ); + this.minUnconverted=min; + this.maxUnconverted=max; + } + + /** + * Generates a trie query using the supplied field with range bounds in date/time form. + * You can set min or max (but not both) to null to leave one bound open. + */ + public TrieRangeFilter(final String field, final Date min, final Date max) { + this(field,min,max,getDefaultTrieVariant()); + } + + /** + * Generates a trie query using the supplied field with range bounds in integer form (long). + * You can set min or max (but not both) to null to leave one bound open. + */ + public TrieRangeFilter(final String field, final Long min, final Long max, final TrieUtils variant) { + this( + field, + (min==null) ? null : variant.longToTrieCoded(min.longValue()), + (max==null) ? null : variant.longToTrieCoded(max.longValue()), + variant + ); + this.minUnconverted=min; + this.maxUnconverted=max; + } + + /** + * Generates a trie query using the supplied field with range bounds in integer form (long). + * You can set min or max (but not both) to null to leave one bound open. + */ + public TrieRangeFilter(final String field, final Long min, final Long max) { + this(field,min,max,getDefaultTrieVariant()); + } + + //@Override + public String toString() { + return toString(null); + } + + public String toString(final String field) { + final StringBuffer sb=new StringBuffer(); + if (!this.field.equals(field)) sb.append(this.field).append(':'); + return sb.append('[').append(minUnconverted).append(" TO ").append(maxUnconverted).append(']').toString(); + } + + //@Override + public final boolean equals(final Object o) { + if (o instanceof TrieRangeFilter) { + TrieRangeFilter q=(TrieRangeFilter)o; + // trieVariants are singleton per type, so no equals needed + return (field==q.field && min.equals(q.min) && max.equals(q.max) && trieVariant==q.trieVariant); + } else return false; + } + + //@Override + public final int hashCode() { + // trieVariant's default hashCode is enough, because singleton per type + return field.hashCode()+(min.hashCode()^0x14fa55fb)+(max.hashCode()^0x733fa5fe)+(trieVariant.hashCode()); + } + + /** Marks documents in a specific range. Code borrowed from original RangeFilter and simplified (and returns number of terms) */ + private int setBits(final IndexReader reader, final TermDocs termDocs, final OpenBitSet bits, String lowerTerm, String upperTerm) throws IOException { + //System.out.println(lowerTerm+" TO "+upperTerm); + int count=0,len=lowerTerm.length(); + final String field; + if (len0) break; + // we have a good term, find the docs + count++; + termDocs.seek(enumerator); + while (termDocs.next()) bits.set(termDocs.doc()); + } else break; + } while (enumerator.next()); + } finally { + enumerator.close(); + } + return count; + } + + /** Splits range recursively (and returns number of terms) */ + private int splitRange( + final IndexReader reader, final TermDocs termDocs, final OpenBitSet bits, + final String min, final boolean lowerBoundOpen, final String max, final boolean upperBoundOpen + ) throws IOException { + int count=0; + final int length=min.length(); + final String minShort=lowerBoundOpen ? min.substring(0,length-1) : trieVariant.incrementTrieCoded(min.substring(0,length-1)); + final String maxShort=upperBoundOpen ? max.substring(0,length-1) : trieVariant.decrementTrieCoded(max.substring(0,length-1)); + + if (length==1 || minShort.compareTo(maxShort)>=0) { + // we are in the lowest precision or the current precision is not existent + count+=setBits(reader, termDocs, bits, min, max); + } else { + // Avoid too much seeking: first go deeper into lower precision + // (in IndexReader's TermEnum these terms are earlier). + // Do this only, if the current length is not trieVariant.TRIE_CODED_LENGTH (not full precision), + // because terms from the highest prec come before all lower prec terms + // (because the field name is ordered before the suffixed one). + if (length!=trieVariant.TRIE_CODED_LENGTH) count+=splitRange( + reader,termDocs,bits, + minShort,lowerBoundOpen, + maxShort,upperBoundOpen + ); + // Avoid too much seeking: set bits for lower part of current (higher) precision. + // These terms come later in IndexReader's TermEnum. + if (!lowerBoundOpen) { + count+=setBits(reader, termDocs, bits, min, trieVariant.decrementTrieCoded(minShort+trieVariant.TRIE_CODED_SYMBOL_MIN)); + } + // Avoid too much seeking: set bits for upper part of current precision. + // These terms come later in IndexReader's TermEnum. + if (!upperBoundOpen) { + count+=setBits(reader, termDocs, bits, trieVariant.incrementTrieCoded(maxShort+trieVariant.TRIE_CODED_SYMBOL_MAX), max); + } + // If the first step (see above) was not done (because length==trieVariant.TRIE_CODED_LENGTH) we do it now. + if (length==trieVariant.TRIE_CODED_LENGTH) count+=splitRange( + reader,termDocs,bits, + minShort,lowerBoundOpen, + maxShort,upperBoundOpen + ); + } + return count; + } + + /** + * Returns a DocIdSet that provides the documents which should be permitted or prohibited in search results. + */ + //@Override + public DocIdSet getDocIdSet(IndexReader reader) throws IOException { + final OpenBitSet bits = new OpenBitSet(reader.maxDoc()); + final TermDocs termDocs=reader.termDocs(); + try { + final int count=splitRange( + reader,termDocs,bits, + min,trieVariant.TRIE_CODED_NUMERIC_MIN.equals(min), + max,trieVariant.TRIE_CODED_NUMERIC_MAX.equals(max) + ); + // count is not used yet, we can make statistics on it or debug it like so: + //System.out.println("Found "+count+" distinct terms in filtered range for field '"+field+"'."); + } finally { + termDocs.close(); + } + return bits; + } + + /** + * Sets the default used variant of {@link TrieUtils} used for generating trie values and ranges. + * It is used by the constructors without TrieUtils parameter. + */ + public synchronized static void setDefaultTrieVariant(final TrieUtils variant) { + defaultTrieVariant=variant; + } + + /** + * Sets the default used variant of {@link TrieUtils} used for generating trie values and ranges. + * It is used by the constructors without TrieUtils parameter. + */ + public synchronized static TrieUtils getDefaultTrieVariant() { + return defaultTrieVariant; + } + + // members + private final String field,min,max; + private final TrieUtils trieVariant; + private Object minUnconverted,maxUnconverted; + + // static members + private static TrieUtils defaultTrieVariant=TrieUtils.VARIANT_8BIT; +} \ No newline at end of file Index: contrib/queries/src/java/org/apache/lucene/search/trie/TrieRangeQuery.java =================================================================== --- contrib/queries/src/java/org/apache/lucene/search/trie/TrieRangeQuery.java (revision 0) +++ contrib/queries/src/java/org/apache/lucene/search/trie/TrieRangeQuery.java (revision 0) @@ -0,0 +1,129 @@ +package org.apache.lucene.search.trie; + +/** + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +import java.util.Date; +import java.io.IOException; + +import org.apache.lucene.search.Query; +import org.apache.lucene.search.Filter; +import org.apache.lucene.search.ConstantScoreQuery; +import org.apache.lucene.util.ToStringUtils; +import org.apache.lucene.index.IndexReader; + +/** + * Implementation of a Lucene {@link Query} that implements a trie-based range query. + * This query depends on a specific structure of terms in the index that can only be created + * by {@link TrieUtils} methods. + *

This class wraps a {@link TrieRangeFilter} using a {@link ConstantScoreQuery}. + * @see TrieRangeFilter + * @author Uwe Schindler (panFMP developer) + */ +public final class TrieRangeQuery extends Query { + + /** Generic constructor (internal use only): Uses already trie-converted min/max values */ + public TrieRangeQuery(final String field, final String min, final String max) { + filter=new TrieRangeFilter(field,min,max); + } + + /** + * Generates a trie query using the supplied field with range bounds in numeric form (double). + * You can set min or max (but not both) to null to leave one bound open. + */ + public TrieRangeQuery(final String field, final Double min, final Double max) { + filter=new TrieRangeFilter(field,min,max); + } + + /** + * Generates a trie query using the supplied field with range bounds in date/time form. + * You can set min or max (but not both) to null to leave one bound open. + */ + public TrieRangeQuery(final String field, final Date min, final Date max) { + filter=new TrieRangeFilter(field,min,max); + } + + /** + * Generates a trie query using the supplied field with range bounds in integer form (long). + * You can set min or max (but not both) to null to leave one bound open. + */ + public TrieRangeQuery(final String field, final Long min, final Long max) { + filter=new TrieRangeFilter(field,min,max); + } + + /** Generic constructor (internal use only): Uses already trie-converted min/max values */ + public TrieRangeQuery(final String field, final String min, final String max, final TrieUtils variant) { + filter=new TrieRangeFilter(field,min,max,variant); + } + + /** + * Generates a trie query using the supplied field with range bounds in numeric form (double). + * You can set min or max (but not both) to null to leave one bound open. + */ + public TrieRangeQuery(final String field, final Double min, final Double max, final TrieUtils variant) { + filter=new TrieRangeFilter(field,min,max,variant); + } + + /** + * Generates a trie query using the supplied field with range bounds in date/time form. + * You can set min or max (but not both) to null to leave one bound open. + */ + public TrieRangeQuery(final String field, final Date min, final Date max, final TrieUtils variant) { + filter=new TrieRangeFilter(field,min,max,variant); + } + + /** + * Generates a trie query using the supplied field with range bounds in integer form (long). + * You can set min or max (but not both) to null to leave one bound open. + */ + public TrieRangeQuery(final String field, final Long min, final Long max, final TrieUtils variant) { + filter=new TrieRangeFilter(field,min,max,variant); + } + + //@Override + public String toString(final String field) { + return filter.toString(field)+ToStringUtils.boost(getBoost()); + } + + //@Override + public final boolean equals(final Object o) { + if (o instanceof TrieRangeQuery) { + TrieRangeQuery q=(TrieRangeQuery)o; + return (filter.equals(q.filter) && getBoost()==q.getBoost()); + } else return false; + } + + //@Override + public final int hashCode() { + return filter.hashCode()^0x1756fa55+Float.floatToIntBits(getBoost()); + } + + /** + * Rewrites the query to native Lucene {@link Query}'s. This implementation uses a {@link ConstantScoreQuery} with + * a {@link TrieRangeFilter} as implementation of the trie algorithm. + */ + //@Override + public Query rewrite(final IndexReader reader) throws IOException { + final ConstantScoreQuery q = new ConstantScoreQuery(filter); + q.setBoost(getBoost()); + return q.rewrite(reader); + } + + // members + private final TrieRangeFilter filter; + +} \ No newline at end of file Index: contrib/queries/src/java/org/apache/lucene/search/trie/TrieUtils.java =================================================================== --- contrib/queries/src/java/org/apache/lucene/search/trie/TrieUtils.java (revision 0) +++ contrib/queries/src/java/org/apache/lucene/search/trie/TrieUtils.java (revision 0) @@ -0,0 +1,291 @@ +package org.apache.lucene.search.trie; + +/** + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +import java.util.Date; + +import org.apache.lucene.document.Document; +import org.apache.lucene.document.Field; + +/** + *This is a helper class to construct the trie-based index entries for numerical values. + *

For more information, how the algorithm works, see the package description {@link org.apache.lucene.search.trie}. The format of how the + * numerical values are stored in index is documented here: + *

All numerical values are first converted to special unsigned longs by applying some bit-wise transformations. This means:

+ *

For each variant (you can choose between {@link #VARIANT_8BIT}, {@link #VARIANT_4BIT}, and {@link #VARIANT_2BIT}), + * the bitmap of this unsigned long is divided into parts of a number of bits (starting with the most-significant bits) + * and each part converted to characters between {@link #TRIE_CODED_SYMBOL_MIN} and {@link #TRIE_CODED_SYMBOL_MAX}. + * The resulting {@link String} is comparable like the corresponding unsigned long. + *

To store the different precisions of the long values (from one character [only the most significant one] to the full encoded length), + * each lower precision is prefixed by the length ({@link #TRIE_CODED_PADDING_START}+precision == 0x20+precision), + * in an extra "helper" field with a suffixed field name (i.e. fieldname "numeric" => lower precision's name "numeric#trie"). + * The full long is not prefixed at all and indexed and stored according to the given flags in the original field name. + * By this it is possible to get the correct enumeration of terms in correct precision + * of the term list by just jumping to the correct fieldname and/or prefix. The full precision value may also be + * stored in the document. Having the full precision value as term in a separate field with the original name, + * sorting of query results agains such fields is possible using the original field name. + * @author Uwe Schindler (panFMP developer) + */ +public final class TrieUtils { + + /** Instance of TrieUtils using a trie factor of 8 bit. */ + public static final TrieUtils VARIANT_8BIT=new TrieUtils(8); + + /** Instance of TrieUtils using a trie factor of 4 bit. */ + public static final TrieUtils VARIANT_4BIT=new TrieUtils(4); + + /** Instance of TrieUtils using a trie factor of 2 bit. */ + public static final TrieUtils VARIANT_2BIT=new TrieUtils(2); + + /** Marker (PADDING) before lower-precision trie entries to signal the precision value. See class description! */ + public static final char TRIE_CODED_PADDING_START=(char)0x20; + + /** The "helper" field containing the lower precision terms is the original fieldname with this appended. */ + public static final String LOWER_PRECISION_FIELD_NAME_SUFFIX="#trie"; + + /** Character used as lower end */ + public static final char TRIE_CODED_SYMBOL_MIN=(char)0x100; + + private TrieUtils(int bits) { + assert 64%bits == 0; + + this.TRIE_BITS=bits; + mask = (1L << TRIE_BITS) - 1L; + // init global "constants" + TRIE_CODED_LENGTH=64/TRIE_BITS; + TRIE_CODED_SYMBOL_MAX=(char)(TRIE_CODED_SYMBOL_MIN+mask); + TRIE_CODED_NUMERIC_MIN=longToTrieCoded(Long.MIN_VALUE); + TRIE_CODED_NUMERIC_MAX=longToTrieCoded(Long.MAX_VALUE); + } + + // internal conversion to/from strings + + private final String internalLongToTrieCoded(long l) { + final char[] buf=new char[TRIE_CODED_LENGTH]; + for (int i=TRIE_CODED_LENGTH-1; i>=0; i--) { + buf[i] = (char)( TRIE_CODED_SYMBOL_MIN + (l & mask) ); + l = l >>> TRIE_BITS; + } + return new String(buf); + } + + private final long internalTrieCodedToLong(final String s) { + if (s==null) throw new NullPointerException("Trie encoded string may not be NULL"); + final int len=s.length(); + if (len!=TRIE_CODED_LENGTH) throw new NumberFormatException( + "Invalid trie encoded numerical value representation (incompatible length, must be "+TRIE_CODED_LENGTH+")" + ); + long l=0L; + for (int i=0; i=TRIE_CODED_SYMBOL_MIN && ch<=TRIE_CODED_SYMBOL_MAX) { + l = (l << TRIE_BITS) | (long)(ch-TRIE_CODED_SYMBOL_MIN); + } else { + throw new NumberFormatException( + "Invalid trie encoded numerical value representation (char "+ + Integer.toHexString((int)ch)+" at position "+i+" is invalid)" + ); + } + } + return l; + } + + // Long's + + /** Converts a long value encoded to a String. */ + public String longToTrieCoded(final long l) { + return internalLongToTrieCoded(l ^ 0x8000000000000000L); + } + + /** Converts a encoded String value back to a long. */ + public long trieCodedToLong(final String s) { + return internalTrieCodedToLong(s) ^ 0x8000000000000000L; + } + + // Double's + + /** Converts a double value encoded to a String. */ + public String doubleToTrieCoded(final double d) { + long l=Double.doubleToLongBits(d); + if ((l & 0x8000000000000000L) == 0L) { + // >0 + l |= 0x8000000000000000L; + } else { + // <0 + l = ~l; + } + return internalLongToTrieCoded(l); + } + + /** Converts a encoded String value back to a double. */ + public double trieCodedToDouble(final String s) { + long l=internalTrieCodedToLong(s); + if ((l & 0x8000000000000000L) != 0L) { + // >0 + l &= 0x7fffffffffffffffL; + } else { + // <0 + l = ~l; + } + return Double.longBitsToDouble(l); + } + + // Date's + + /** Converts a Date value encoded to a String. */ + public String dateToTrieCoded(final Date d) { + return longToTrieCoded(d.getTime()); + } + + /** Converts a encoded String value back to a Date. */ + public Date trieCodedToDate(final String s) { + return new Date(trieCodedToLong(s)); + } + + // increment / decrement + + /** Increments an encoded String value by 1. Needed by {@link TrieRangeFilter}. */ + public String incrementTrieCoded(final String v) { + final int l=v.length(); + final char[] buf=new char[l]; + boolean inc=true; + for (int i=l-1; i>=0; i--) { + int b=v.charAt(i)-TRIE_CODED_SYMBOL_MIN; + if (inc) b++; + if (inc=(b>(int)mask)) b=0; + buf[i]=(char)(TRIE_CODED_SYMBOL_MIN+b); + } + return new String(buf); + } + + /** Decrements an encoded String value by 1. Needed by {@link TrieRangeFilter}. */ + public String decrementTrieCoded(final String v) { + final int l=v.length(); + final char[] buf=new char[l]; + boolean dec=true; + for (int i=l-1; i>=0; i--) { + int b=v.charAt(i)-TRIE_CODED_SYMBOL_MIN; + if (dec) b--; + if (dec=(b<0)) b=(int)mask; + buf[i]=(char)(TRIE_CODED_SYMBOL_MIN+b); + } + return new String(buf); + } + + private void addConvertedTrieCodedDocumentField( + final Document ldoc, final String fieldname, final String val, + final boolean index, final Field.Store store + ) { + Field f=new Field(fieldname, val, store, index?Field.Index.NOT_ANALYZED:Field.Index.NO); + if (index) { + f.setOmitTf(true); + ldoc.add(f); + // add the lower precision values in the helper field with prefix + final StringBuffer sb=new StringBuffer(TRIE_CODED_LENGTH); + synchronized(sb) { + for (int i=TRIE_CODED_LENGTH-1; i>0; i--) { + sb.setLength(0); + f=new Field( + fieldname + LOWER_PRECISION_FIELD_NAME_SUFFIX, + sb.append( (char)(TRIE_CODED_PADDING_START+i) ).append( val.substring(0,i) ).toString(), + Field.Store.NO, Field.Index.NOT_ANALYZED + ); + f.setOmitTf(true); + ldoc.add(f); + } + } + } else { + ldoc.add(f); + } + } + + /** + * Stores a double value in trie-form in document for indexing. + *

To store the different precisions of the long values (from one byte [only the most significant one] to the full eight bytes), + * each lower precision is prefixed by the length ({@link #TRIE_CODED_PADDING_START}+precision), + * in an extra "helper" field with a name of fieldname+LOWER_PRECISION_FIELD_NAME_SUFFIX + * (i.e. fieldname "numeric" => lower precision's name "numeric#trie"). + * The full long is not prefixed at all and indexed and stored according to the given flags in the original field name. + * If the field should not be searchable, set index to false. It is then only stored (for convenience). + * Fields added to a document using this method can be queried by {@link TrieRangeQuery}. + */ + public void addDoubleTrieCodedDocumentField( + final Document ldoc, final String fieldname, final double val, + final boolean index, final Field.Store store + ) { + addConvertedTrieCodedDocumentField(ldoc, fieldname, doubleToTrieCoded(val), index, store); + } + + /** + * Stores a Date value in trie-form in document for indexing. + *

To store the different precisions of the long values (from one byte [only the most significant one] to the full eight bytes), + * each lower precision is prefixed by the length ({@link #TRIE_CODED_PADDING_START}+precision), + * in an extra "helper" field with a name of fieldname+LOWER_PRECISION_FIELD_NAME_SUFFIX + * (i.e. fieldname "numeric" => lower precision's name "numeric#trie"). + * The full long is not prefixed at all and indexed and stored according to the given flags in the original field name. + * If the field should not be searchable, set index to false. It is then only stored (for convenience). + * Fields added to a document using this method can be queried by {@link TrieRangeQuery}. + */ + public void addDateTrieCodedDocumentField( + final Document ldoc, final String fieldname, + final Date val, final boolean index, final Field.Store store + ) { + addConvertedTrieCodedDocumentField(ldoc, fieldname, dateToTrieCoded(val), index, store); + } + + /** + * Stores a long value in trie-form in document for indexing. + *

To store the different precisions of the long values (from one byte [only the most significant one] to the full eight bytes), + * each lower precision is prefixed by the length ({@link #TRIE_CODED_PADDING_START}+precision), + * in an extra "helper" field with a name of fieldname+LOWER_PRECISION_FIELD_NAME_SUFFIX + * (i.e. fieldname "numeric" => lower precision's name "numeric#trie"). + * The full long is not prefixed at all and indexed and stored according to the given flags in the original field name. + * If the field should not be searchable, set index to false. It is then only stored (for convenience). + * Fields added to a document using this method can be queried by {@link TrieRangeQuery}. + */ + public void addLongTrieCodedDocumentField( + final Document ldoc, final String fieldname, + final long val, final boolean index, final Field.Store store + ) { + addConvertedTrieCodedDocumentField(ldoc, fieldname, longToTrieCoded(val), index, store); + } + + private final long mask; + + public final int TRIE_BITS; + + /** Length of an encoded value */ + public final int TRIE_CODED_LENGTH; + + /** Character used as upper end (depends on trie bits) */ + public final char TRIE_CODED_SYMBOL_MAX; + + /** minimum encoded value of a numerical index entry: {@link Long#MIN_VALUE} */ + public final String TRIE_CODED_NUMERIC_MIN; + /** maximum encoded value of a numerical index entry: {@link Long#MAX_VALUE} */ + public final String TRIE_CODED_NUMERIC_MAX; + +} + Index: contrib/queries/src/test/org/apache/lucene/search/trie/TestTrieRangeQuery.java =================================================================== --- contrib/queries/src/test/org/apache/lucene/search/trie/TestTrieRangeQuery.java (revision 0) +++ contrib/queries/src/test/org/apache/lucene/search/trie/TestTrieRangeQuery.java (revision 0) @@ -0,0 +1,124 @@ +package org.apache.lucene.search.trie; + +/** + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +import java.io.IOException; + +import junit.framework.TestCase; + +import org.apache.lucene.analysis.WhitespaceAnalyzer; +import org.apache.lucene.document.Document; +import org.apache.lucene.document.Field; +import org.apache.lucene.index.IndexWriter; +import org.apache.lucene.index.IndexWriter.MaxFieldLength; +import org.apache.lucene.store.RAMDirectory; +import org.apache.lucene.search.IndexSearcher; +import org.apache.lucene.search.ScoreDoc; +import org.apache.lucene.search.TopDocs; +import org.apache.lucene.search.Sort; +import org.apache.lucene.search.SortField; + +public class TestTrieRangeQuery extends TestCase +{ + private static final long distance=66666; + + private static RAMDirectory directory; + private static IndexSearcher searcher; + static { + try { + directory = new RAMDirectory(); + IndexWriter writer = new IndexWriter(directory, new WhitespaceAnalyzer(), + true, MaxFieldLength.UNLIMITED); + + // Add a series of 10000 docs with increasing long values + for (long l=0L; l