Right now it's 4, for both 8 (long/double) and 4 byte (int/float)
numeric fields, but this is a pretty big hit on indexing speed and
disk usage, especially for tiny documents, because it creates many (8
or 16) terms for each value.
Since we originally set these defaults, a lot has changed... e.g. we
now rewrite MTQs per-segment, we have a faster (BlockTree) terms dict,
a faster postings format, etc.
Index size is important because it limits how much of the index will
be hot (fit in the OS's IO cache). And more apps are using Lucene for
tiny docs where the overhead of individual fields is sizable.
I used the Geonames corpus to run a simple benchmark (all sources are
committed to luceneutil). It has 8.6 M tiny docs, each with 23 fields,
with these numeric fields:
- lat/lng (double)
- modified time, elevation, population (long)
- dem (int)
I tested 4, 8 and 16 precision steps:
Index time is with 1 thread (for identical index structure).
The query time is time to run 100 random ranges for that field,
averaged over 20 iterations. TermCount is the total number of terms
the MTQ rewrote to across all 100 queries / segments, and it gets
higher as expected as precStep gets higher, but the search time is not
that heavily impacted ... negligible going from 4 to 8, and then some
impact from 8 to 16.
Maybe we should increase the int/float default precision step to 8 and
long/double to 16? Or both to 16?