Details
-
Improvement
-
Status: Closed
-
Minor
-
Resolution: Fixed
-
None
-
None
-
None
-
New
Description
Robert suggested this idea: if there are few unique sets of values, we could build a lookup table and then map each doc to an ord in this table, just like we already do for table compression for numerics.
I think this is especially compelling given that SortedSet/SortedNumeric are our two only doc values types that use O(maxDoc) memory because of the offsets map. When this new strategy is used, memory usage could be bounded to a constant.