Index: src/java/org/apache/lucene/search/NumericRangeFilter.java =================================================================== --- src/java/org/apache/lucene/search/NumericRangeFilter.java (revision 809097) +++ src/java/org/apache/lucene/search/NumericRangeFilter.java (working copy) @@ -22,26 +22,30 @@ import org.apache.lucene.util.NumericUtils; // for javadocs /** - * Implementation of a {@link Filter} that implements trie-based range filtering - * for numeric values. For more information about the algorithm look into the docs of - * {@link NumericRangeQuery}. + * A {@link Filter} that only accepts numeric values within + * a specified range. To use this, you must first index the + * numeric values using {@link NumericField} (expert: {@link + * NumericTokenStream}). * - *

This filter depends on a specific structure of terms in the index that can only be created - * by indexing using {@link NumericField} (expert: {@link NumericTokenStream}). + *

You create a new NumericRangeFilter with the static + * factory methods, eg: * - *

Please note: This class has no constructor, you can create filters depending on the data type - * by using the static factories {@linkplain #newLongRange NumericRangeFilter.newLongRange()}, - * {@linkplain #newIntRange NumericRangeFilter.newIntRange()}, {@linkplain #newDoubleRange NumericRangeFilter.newDoubleRange()}, - * and {@linkplain #newFloatRange NumericRangeFilter.newFloatRange()}, e.g.: *

- * Filter f = NumericRangeFilter.newFloatRange(field, precisionStep,
+ * Filter f = NumericRangeFilter.newFloatRange("weight",
  *                                             new Float(0.3f), new Float(0.10f),
  *                                             true, true);
  * 
* + * accepts all documents whose float valued "weight" field + * ranges from 0.3 to 0.10, inclusive. + * *

NOTE: This API is experimental and - * might change in incompatible ways in the next release. + * might change in incompatible ways in the next + * release. * + * See {@link NumericRangeQuery} for details on how Lucene + * indexes and searches numeric valued fields. + * * @since 2.9 **/ public final class NumericRangeFilter extends MultiTermQueryWrapperFilter { Index: src/java/org/apache/lucene/search/TermRangeFilter.java =================================================================== --- src/java/org/apache/lucene/search/TermRangeFilter.java (revision 809097) +++ src/java/org/apache/lucene/search/TermRangeFilter.java (working copy) @@ -20,12 +20,13 @@ import java.text.Collator; /** - * A Filter that restricts search results to a range of values in a given - * field. + * A Filter that restricts search results to a range of term + * values in a given field. * *

This filter matches the documents looking for terms that fall into the - * supplied range according to {@link String#compareTo(String)}. It is not intended - * for numerical ranges, use {@link NumericRangeFilter} instead. + * supplied range according to {@link + * String#compareTo(String)}, unless a Collator is provided. It is not intended + * for numerical ranges; use {@link NumericRangeFilter} instead. * *

If you construct a large number of range filters with different ranges but on the * same field, {@link FieldCacheRangeFilter} may have significantly better performance. Index: src/java/org/apache/lucene/search/NumericRangeQuery.java =================================================================== --- src/java/org/apache/lucene/search/NumericRangeQuery.java (revision 809097) +++ src/java/org/apache/lucene/search/NumericRangeQuery.java (working copy) @@ -29,35 +29,59 @@ import org.apache.lucene.index.Term; /** - * Implementation of a {@link Query} that implements trie-based range querying - * for numeric values. + *

A {@link Query} that matches numeric values within a + * specified range. To use this, you must first index the + * numeric values using {@link NumericField} (expert: {@link + * NumericTokenStream}). If your terms are instead textual, + * you should use {@link TermRangeQuery}. {@link + * NumericRangeFilter} is the filter equivalent of this + * query.

* - *

Usage

- *

Indexing

- * Before numeric values can be queried, they must be indexed in a special way. You can do this - * by adding numeric fields to the index by specifying a {@link NumericField} (expert: {@link NumericTokenStream}). - * An important setting is the precisionStep, which specifies, - * how many different precisions per numeric value are indexed to speed up range queries. - * Lower values create more terms but speed up search, higher values create less terms, but - * slow down search. Suitable values are between 1 and 8. A good starting point to test is 4, - * which is the default value for all Numeric* classes. For a discussion about ideal - * values, see below. Indexing code examples can be found in {@link NumericField}. + *

You create a new NumericRangeQuery with the static + * factory methods, eg: * - *

Searching

- *

This class has no constructor, you can create queries depending on the data type - * by using the static factories {@linkplain #newLongRange NumericRangeQuery.newLongRange()}, - * {@linkplain #newIntRange NumericRangeQuery.newIntRange()}, {@linkplain #newDoubleRange NumericRangeQuery.newDoubleRange()}, - * and {@linkplain #newFloatRange NumericRangeQuery.newFloatRange()}, e.g.: *

- * Query q = NumericRangeQuery.newFloatRange(field, precisionStep,
+ * Query q = NumericRangeQuery.newFloatRange("weight",
  *                                           new Float(0.3f), new Float(0.10f),
  *                                           true, true);
  * 
- * The used precisionStep must be compatible - * to the one used during indexing (see below). The default is also 4. * - *

How it works

+ * matches all documents whose float valued "weight" field + * ranges from 0.3 to 0.10, inclusive. * + *

The performance of NumericRangeQuery is much better + * than the corresponding {@link TermRangeQuery} because the + * number of terms that must be searched is usually far + * fewer, thanks to trie indexing, described below.

+ * + *

You can optionally specify a precisionStep + * when creating this query. This is necessary if you've + * changed this configuration from its default (4) during + * indexing. Lower values consume more disk space but speed + * up searching. Suitable values are between 1 and + * 8. A good starting point to test is 4, + * which is the default value for all Numeric* + * classes. See below for + * details. + * + *

This query defaults to {@linkplain + * MultiTermQuery#CONSTANT_SCORE_AUTO_REWRITE_DEFAULT} for + * 32 bit (int/float) ranges with precisionStep ≤8 and 64 + * bit (long/double) ranges with precisionStep ≤6. + * Otherwise it uses {@linkplain + * MultiTermQuery#CONSTANT_SCORE_FILTER_REWRITE} as the + * number of terms is likely to be high. With precision + * steps of ≤4, this query can be run with one of the + * BooleanQuery rewrite methods without changing + * BooleanQuery's default max clause count. + * + *

NOTE: This API is experimental and + * might change in incompatible ways in the next release. + * + * + *

How it works

+ * *

See the publication about panFMP, * where this algorithm was described (referred to as TrieRangeQuery): * @@ -118,10 +142,6 @@ * Sorting is also possible with range query optimized fields using one of the above precisionSteps. * * - *

This dramatically improves the performance of Apache Lucene with range queries, which - * are no longer dependent on the index size and the number of distinct values because there is - * an upper limit unrelated to either of these properties.

- * *

Comparisions of the different types of RangeQueries on an index with about 500,000 docs showed * that {@link TermRangeQuery} in boolean rewrite mode (with raised {@link BooleanQuery} clause count) * took about 30-40 secs to complete, {@link TermRangeQuery} in constant score filter rewrite mode took 5 secs @@ -129,19 +149,6 @@ * precision step). This query type was developed for a geographic portal, where the performance for * e.g. bounding boxes or exact date/time stamps is important.

* - *

The query defaults to {@linkplain MultiTermQuery#CONSTANT_SCORE_AUTO_REWRITE_DEFAULT} - * for 32 bit (int/float) ranges with precisionStep ≤8 and - * 64 bit (long/double) ranges with precisionStep ≤6. - * Otherwise it uses {@linkplain - * MultiTermQuery#CONSTANT_SCORE_FILTER_REWRITE} as the - * number of terms is likely to be high. - * With precision steps of ≤4, this query can be run with - * one of the BooleanQuery rewrite methods without changing - * BooleanQuery's default max clause count. - * - *

NOTE: This API is experimental and - * might change in incompatible ways in the next release. - * * @since 2.9 **/ public final class NumericRangeQuery extends MultiTermQuery { Index: src/java/org/apache/lucene/search/TermRangeQuery.java =================================================================== --- src/java/org/apache/lucene/search/TermRangeQuery.java (revision 809097) +++ src/java/org/apache/lucene/search/TermRangeQuery.java (working copy) @@ -24,11 +24,12 @@ import org.apache.lucene.util.ToStringUtils; /** - * A Query that matches documents within an exclusive range of terms. + * A Query that matches documents within an range of terms. * *

This query matches the documents looking for terms that fall into the - * supplied range according to {@link String#compareTo(String)}. It is not intended - * for numerical ranges, use {@link NumericRangeQuery} instead. + * supplied range according to {@link + * String#compareTo(String)}, unless a Collator is provided. It is not intended + * for numerical ranges; use {@link NumericRangeQuery} instead. * *

This query uses the {@link * MultiTermQuery#CONSTANT_SCORE_AUTO_REWRITE_DEFAULT} Index: src/java/org/apache/lucene/search/IndexSearcher.java =================================================================== --- src/java/org/apache/lucene/search/IndexSearcher.java (revision 809097) +++ src/java/org/apache/lucene/search/IndexSearcher.java (working copy) @@ -31,11 +31,12 @@ /** Implements search over a single IndexReader. * - *

Applications usually need only call the inherited {@link #search(Query)} - * or {@link #search(Query,Filter)} methods. For performance reasons it is + *

Applications usually need only call the inherited + * {@link #search(Query,int)} + * or {@link #search(Query,Filter,int)} methods. For performance reasons it is * recommended to open only one IndexSearcher and use it for all of your searches. * - *

Note that you can only access Hits from an IndexSearcher as long as it is + *

Note that you can only access the deprecated {@link Hits} from an IndexSearcher as long as it is * not yet closed, otherwise an IOException will be thrown. * *

NOTE: {@link Index: src/java/org/apache/lucene/store/Directory.java =================================================================== --- src/java/org/apache/lucene/store/Directory.java (revision 809097) +++ src/java/org/apache/lucene/store/Directory.java (working copy) @@ -45,7 +45,9 @@ * this Directory instance). */ protected LockFactory lockFactory; - /** @deprecated For some Directory implementations ({@link + /** List the files in the directory. + * + * @deprecated For some Directory implementations ({@link * FSDirectory}, and its subclasses), this method * silently filters its results to include only index * files. Please use {@link #listAll} instead, which Index: src/java/org/apache/lucene/store/FileSwitchDirectory.java =================================================================== --- src/java/org/apache/lucene/store/FileSwitchDirectory.java (revision 809097) +++ src/java/org/apache/lucene/store/FileSwitchDirectory.java (working copy) @@ -23,11 +23,17 @@ import java.util.Set; /** - * Files with the specified extensions are placed in the + * Expert: A Directory instance that switches files betweeen + * two other Directory instances. + + *

Files with the specified extensions are placed in the * primary directory; others are placed in the secondary * directory. The provided Set must not change once passed * to this class, and must allow multiple threads to call - * contains at once. + * contains at once.

+ * + *

NOTE: this API is new and experimental and is + * subject to suddenly change in the next release. */ public class FileSwitchDirectory extends Directory { @@ -43,11 +49,13 @@ this.doClose = doClose; this.lockFactory = primaryDir.getLockFactory(); } - + + /** Return the primary directory */ public Directory getPrimaryDir() { return primaryDir; } + /** Return the secondary directory */ public Directory getSecondaryDir() { return secondaryDir; } @@ -76,6 +84,7 @@ return listAll(); } + /** Utility method to return a file's extension. */ public static String getExtension(String name) { int i = name.lastIndexOf('.'); if (i == -1) { Index: src/java/org/apache/lucene/document/NumericField.java =================================================================== --- src/java/org/apache/lucene/document/NumericField.java (revision 809097) +++ src/java/org/apache/lucene/document/NumericField.java (working copy) @@ -28,59 +28,108 @@ import org.apache.lucene.search.FieldCache; // javadocs /** - * This class provides a {@link Field} for indexing numeric values - * that can be used by {@link NumericRangeQuery}/{@link NumericRangeFilter}. - * For more information, how to use this class and its configuration properties - * (precisionStep) - * read the docs of {@link NumericRangeQuery}. + *

This class provides a {@link Field} that enables indexing + * of numeric values for efficient range filtering and + * sorting. Here's an example usage, adding an int value: + *

+ *   document.add(new NumericField(name).setIntValue(value));
+ * 
* - *

A numeric value is indexed as multiple string encoded terms, each reduced - * by zeroing bits from the right. Each value is also prefixed (in the first char) by the - * shift value (number of bits removed) used during encoding. - * The number of bits removed from the right for each trie entry is called - * precisionStep in this API. + * For optimal performance, re-use the + * NumericField and {@link Document} instance for more than + * one document: * - *

The usage pattern is: *

- *  document.add(
- *   new NumericField(name, precisionStep, Field.Store.XXX, true).set???Value(value)
- *  );
- * 
- *

For optimal performance, re-use the NumericField and {@link Document} instance - * for more than one document: - *

  *  // init
- *  NumericField field = new NumericField(name, precisionStep, Field.Store.XXX, true);
+ *  NumericField field = new NumericField(name);
  *  Document document = new Document();
  *  document.add(field);
- *  // use this code to index many documents:
- *  field.set???Value(value1)
- *  writer.addDocument(document);
- *  field.set???Value(value2)
- *  writer.addDocument(document);
- *  ...
+ *
+ *  for(all documents) {
+ *    ...
+ *    field.setIntValue(value)
+ *    writer.addDocument(document);
+ *    ...
+ *  }
  * 
* - *

More advanced users can instead use {@link NumericTokenStream} directly, when - * indexing numbers. This class is a wrapper around this token stream type for easier, - * more intuitive usage. + *

The java native types int, long, float and double are + * directly supported. However, any value that can be + * converted into these native types can also be indexed. + * For example, date/time values represented by a + * java.util.Date can be translated into a long value using + * the getTime method. Alternatively, you can also use + * {@link DateTools} to first quantize the date to a + * specified precision and then convert the resulting string + * into an int or a long.

* - *

Please note: This class is only used during indexing. You can also create - * numeric stored fields with it, but when retrieving the stored field value - * from a {@link Document} instance after search, you will get a conventional - * {@link Fieldable} instance where the numeric values are returned as {@link String}s - * (according to toString(value) of the used data type). + *

To perform range querying or filtering against a + * NumericField, use {@link NumericRangeQuery} or {@link + * NumericRangeFilter}. To sort according to a + * NumericField, use the normal numeric sort types, eg + * {@link SortField#INT} (note that {@link SortField#AUTO} + * will not work with these fields). NumericField values + * can also be loaded directly from {@link FieldCache}.

* - *

Values indexed by this field can be loaded into the {@link FieldCache} - * and can be sorted (use {@link SortField}{@code .TYPE} to specify the correct - * type; {@link SortField#AUTO} does not work with this type of field). - * Values solely used for sorting can be indexed using a precisionStep - * of {@link Integer#MAX_VALUE} (at least ≥64), because this step only produces - * one value token with highest precision. + *

By default, a NumericField's value is not stored but + * is indexed for range filtering and sorting. You can use + * the {@link #NumericField(String,Field.Store,boolean)} + * constructor if you need to change these defaults.

* - *

NOTE: This API is experimental and - * might change in incompatible ways in the next release. + *

You may add the same field name as a NumericField to + * the same document more than once. Range querying and + * filtering will be the logical OR of all values, however + * sort behavior is not defined. If you need to sort, you + * should separately index a single-valued NumericField.

* + *

A NumericField will consume somewhat more disk space + * in the index than an ordindary single-valued field. + * However, for a typical index that includes substantial + * textual content per document, this increase will likely + * be in the noise.

+ * + *

Within lucene, each numeric value is indexed as + * multiple encoded terms representing larger and larger + * pre-defined brackets called tries. The step + * size between each successive trie is called the + * precisionStep in this API. Smaller + * precisionStep values result in larger number + * of tries, which consumes more disk space in the index but + * may result in faster range search performance. The + * default value, 4, was selected for a reasonable tradeoff + * of disk space consumption versus performance. You can + * use the expert constructor {@link + * #NumericField(String,int,Field.Store,boolean)} if you'd + * like to change the value. Note that you must also + * specify a congruent value when creating {@link + * NumericRangeQuery} or {@link NumericRangeFilter}. + * + *

If you only need to sort by numeric value, and never + * run range querying/filtering, you can index using a + * precisionStep of {@link Integer#MAX_VALUE}. + * This will minimize disk space consumed.

+ * + *

More advanced users can instead use {@link + * NumericTokenStream} directly, when indexing numbers. This + * class is a wrapper around this token stream type for + * easier, more intuitive usage.

+ * + *

For more information on the internals of numeric trie + * indexing, including the precisionStep + * configuration, see {@link NumericRangeQuery}. + * + *

NOTE: This class is only used during + * indexing. When retrieving the stored field value from a + * {@link Document} instance after search, you will get a + * conventional {@link Fieldable} instance where the numeric + * values are returned as {@link String}s (according to + * toString(value) of the used data type). + * + *

NOTE: This API is + * experimental and might change in incompatible ways in the + * next release. + * * @since 2.9 */ public final class NumericField extends AbstractField {