Index: lucene/core/src/java/org/apache/lucene/search/doc-files/nrq-formula-1.png =================================================================== Cannot display: file marked as a binary type. svn:mime-type = image/png Index: lucene/core/src/java/org/apache/lucene/search/doc-files/nrq-formula-1.png =================================================================== --- lucene/core/src/java/org/apache/lucene/search/doc-files/nrq-formula-1.png (revision 0) +++ lucene/core/src/java/org/apache/lucene/search/doc-files/nrq-formula-1.png (working copy) Property changes on: lucene/core/src/java/org/apache/lucene/search/doc-files/nrq-formula-1.png ___________________________________________________________________ Added: svn:mime-type ## -0,0 +1 ## +image/png \ No newline at end of property Index: lucene/core/src/java/org/apache/lucene/search/doc-files/nrq-formula-2.png =================================================================== Cannot display: file marked as a binary type. svn:mime-type = image/png Index: lucene/core/src/java/org/apache/lucene/search/doc-files/nrq-formula-2.png =================================================================== --- lucene/core/src/java/org/apache/lucene/search/doc-files/nrq-formula-2.png (revision 0) +++ lucene/core/src/java/org/apache/lucene/search/doc-files/nrq-formula-2.png (working copy) Property changes on: lucene/core/src/java/org/apache/lucene/search/doc-files/nrq-formula-2.png ___________________________________________________________________ Added: svn:mime-type ## -0,0 +1 ## +image/png \ No newline at end of property Index: lucene/core/src/java/org/apache/lucene/search/NumericRangeQuery.java =================================================================== --- lucene/core/src/java/org/apache/lucene/search/NumericRangeQuery.java (revision 1418625) +++ lucene/core/src/java/org/apache/lucene/search/NumericRangeQuery.java (working copy) @@ -73,14 +73,9 @@ * details. * *

This query defaults to {@linkplain - * MultiTermQuery#CONSTANT_SCORE_AUTO_REWRITE_DEFAULT} for - * 32 bit (int/float) ranges with precisionStep ≤8 and 64 - * bit (long/double) ranges with precisionStep ≤6. - * Otherwise it uses {@linkplain - * MultiTermQuery#CONSTANT_SCORE_FILTER_REWRITE} as the - * number of terms is likely to be high. With precision - * steps of ≤4, this query can be run with one of the - * BooleanQuery rewrite methods without changing + * MultiTermQuery#CONSTANT_SCORE_AUTO_REWRITE_DEFAULT}. + * With precision steps of ≤4, this query can be run with + * one of the BooleanQuery rewrite methods without changing * BooleanQuery's default max clause count. * *

How it works

@@ -117,17 +112,29 @@ * *

Precision Step

*

You can choose any precisionStep when encoding values. - * Lower step values mean more precisions and so more terms in index (and index gets larger). - * On the other hand, the maximum number of terms to match reduces, which optimized query speed. - * The formula to calculate the maximum term count is: - *

- *  n = [ (bitsPerValue/precisionStep - 1) * (2^precisionStep - 1 ) * 2 ] + (2^precisionStep - 1 )
- * 
- *

(this formula is only correct, when bitsPerValue/precisionStep is an integer; - * in other cases, the value must be rounded up and the last summand must contain the modulo of the division as - * precision step). - * For longs stored using a precision step of 4, n = 15*15*2 + 15 = 465, and for a precision - * step of 2, n = 31*3*2 + 3 = 189. But the faster search speed is reduced by more seeking + * Lower step values mean more precisions and so more terms in index (and index gets larger). The number + * of indexed terms per value is (those are generated by {@link NumericTokenStream}): + *

+ *   indexedTermsPerValue = ceil(bitsPerValue / precisionStep) + *

+ * As the lower precision terms are shared by many values, the additional terms only + * slightly grow the term dictionary (approx. 7% for precisionStep=4), but have a larger + * impact on the postings (the postings file will have more entries, as every document is linked to + * indexedTermsPerValue terms instead of one). The formula to estimate the growth + * of the term dictionary in comparison to one term per value: + *

+ * + *   \mathrm{termDictOverhead} = \sum\limits_{i=0}^{\mathrm{indexedTermsPerValue}-1} \frac{1}{2^{\mathrm{precisionStep}\cdot i}} + *

+ *

On the other hand, if the precisionStep is smaller, the maximum number of terms to match reduces, + * which optimizes query speed. The formula to calculate the maximum number of terms that will be visited while + * executing the query is: + *

+ * + *   \mathrm{maxQueryTerms} = \left[ \left( \mathrm{indexedTermsPerValue} - 1 \right) \cdot \left(2^\mathrm{precisionStep} - 1 \right) \cdot 2 \right] + \left( 2^\mathrm{precisionStep} - 1 \right) + *

+ *

For longs stored using a precision step of 4, maxQueryTerms = 15*15*2 + 15 = 465, and for a precision + * step of 2, maxQueryTerms = 31*3*2 + 3 = 189. But the faster search speed is reduced by more seeking * in the term enum of the index. Because of this, the ideal precisionStep value can only * be found out by testing. Important: You can index with a lower precision step value and test search speed * using a multiple of the original step value.

@@ -143,7 +150,7 @@ * per value in the index and querying is as slow as a conventional {@link TermRangeQuery}. But it can be used * to produce fields, that are solely used for sorting (in this case simply use {@link Integer#MAX_VALUE} as * precisionStep). Using {@link IntField}, - * {@link LongField}, {@link FloatField} or {@link DoubleField} for sorting + * {@link LongField}, {@link FloatField} or {@link DoubleField} for sorting * is ideal, because building the field cache is much faster than with text-only numbers. * These fields have one term per value and therefore also work with term enumeration for building distinct lists * (e.g. facets / preselected values to search for).