I have some first "benchmarks" on the three trie implementations. They were done using three indexes with same content, but encoded in 8bit, 4bit and 2bit, containing real-world data in 575,000 documents of the PANGAEA repository (see www.pangaea.de as mentioned before). The same range queries were executed after some warmup and time was measured until TopDocs returned the first 10 documents.
The indexes each contain 13 numeric, tree encoded fields (doubles and Dates). Index size (including the "normal" fields) was:
- 8bit: 4.8 GiB
- 4bit: 5.1 GiB
- 2bit: 5.7 GiB
So the addition 13*8*575,000 extra trie terms (with longer) size took 300 MB going from 8 to 4bit. Going from 4 to 2bit, the additional 13*16*575,000 extra terms took 600 MB (additional to the step from 8 to 4 bit, but its linear to the number of additional terms).
The performance impact was neglectible (or better the standard deviation was bigger than the performance plus), so I think still, 8bit is a good idea and index size is the smallest.
My idea, why this is so: Using fewer bits increases number of terms in index (I do not know how this decreases performance), needs more TermEnum seeks (for each trie precision, 2 seeking operations are needed). These term enum seeks are faster for 8bit trie varaint, because the 2 chars prefix can be used extensively (first prefix=precision, second char = first byte of numeric value). With 4bit and 2bit you get only 4bit or 2bits in addition to precision marker. So seeks in term enum are slower.
In comparison a ConstantScoreRangeQuery on the full precision trie-coded values took about 10-20 times longer. Here the numbers:
For PANGAEA, all queries are half open (we have the problem, that data sets have ittself bounding boxes - minlatitude, maxlatitude, minlongitude, maxlongitude) and the user searches for bounding boxes, hits are datasets that intersect the search bounding box. So for a complete lat/lon constraint you have 4 half open ranges (with the current Trie implementation, this is not a problem, because two ANDed filters just withit half of the number of terms needed for a full range. The only performance impact is ANDing the two OpenBitSets). So 4 such half-open ranges ANDed together take the following times on the given index after some warmup, inkl fetching the first 10 documents:
ConstantScoreRange: 1500-4000 ms
TrieRangeQuery 8bit: 40ms-100ms
TrieRangeQuery 4bit: 30ms-80ms
TrieRangeQuery 2bit: 20ms-80ms
The old RangeQuery was not tested, but that hits the MaxClauseCount constraint, and if this is raised to Integer.MAX_VALUE, it tooks approx 1 minute to finish). The numbers are pure rnage queries, if you additionally add some search terms, time goes up a little bit. Used was only TrieRangeQuery (so rewrite to constant score), no docs were filtered in the classical way: termqueries + filtering of results.
More benchmark results follow, when I finish the benchmarker. But these are real world examples. The search engine machine was a Sun X4600 machine with 16 opteron cores, 64bit Java 1.5, Sun Java System Web Server, -Xmx 8192M (our current configuration). On lower cost machnies like desktop pcs, the ConstantScoreRangeQuery takes much more time, but TrieRangeQuery is approx the same time, as disk io seeks is more the limiting factor. Processor speed and number of processors is not the limiting factor (if only one query is run in parallel).