Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
Previously, after we get a list of pkeys of secondary index, we perform point lookups against the primary index to fetch the records. When the number of disk components is large, we need to perform a lot of unnecessary searches because of false positives of bloom filters. However, since the memory components of all indexes are always flushed together, we can narrow down the candidate components of the primary index based on the component of the secondary index where the pkey is found.
To enable this optimization, we first need to assign a unique Id to all components (including disk and memory), and guarantee all memory components of a dataset (partition) receive the same id upon creation. These component Ids are propagated to the primary index during query processing to facilitate primary index lookups.