I used YourKit and profiled memory usage for your test (little bit modified to call IndexSummary#complete) and it shows
IndexSummary: 21,597,040 (~20MB)
FST: 3,576,248 (~3.4MB)
for storing 10,000 keys to each, so it's pretty impressive. If we can deliver this, it will be huge win.
(Note that on disk, IndexSummary only writes key portion of DecoratedKey so it may be smaller than FST.)
My concerns left are as follows:
- Planned 1.2 release saves IndexSummary to disk(
CASSANDRA-2392), so I think it is better to leave current implementation and add FST version of IndexSummary so you can rw from both.
- DecoratedKeys stored inside current IndexSummary are actually accessed from various places, and FST version will lack those information, you may need to figure out the alternative way to preserve current functionality.
- If you want to use Lucene 4.0, we should release this feature after 4.0 release.
Also the last results are for 100,000 keys rather than 1 mil.
IndexSummary holds keys for every index_interval(default 128), so I think you don't need to test with 1 mil.