Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
New
Description
Sorted/SortedSet give you ordinal(s) per document, but them separately have a "term dictionary" of all the values.
You can do a few operations on these:
- ord -> term lookup (e.g. retrieving facet labels)
- term -> ord lookup (reverse lookup: e.g. fieldcacherangefilter)
- get a term enumerator (e.g. merging, ordinalmap construction)
The current implementation for diskdv was the simplest thing that can possibly work: under the hood it just makes a binary DV for these (treating ordinals as document ids). When the terms are fixed length, you can address a term directly with multiplication. When they are variable length though, we have to store a packed ints structure in RAM.
This variable length case is overkill and chews up a lot of RAM if you have many unique values. It also chews up a lot of disk since all the values are just concatenated (no sharing).