Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
1.2.0
-
None
Description
The current CBO implementation requires column nDV statistics to produce good estimates of JOIN selectivity and filter selectivity.
The ORC bloom filters provides an opportunity to estimate the net population of a row-group with false-positive rates capped for each row-group.
This is not useful for filter conditions or join conditions with a cardinality which is a large fraction of the row-count, but can collect viable statistics for low-cardinality filter columns (de-normalization scenarios) or for JOIN dimension columns of low cardinality (demographics or store location).
The challenge in this feature is in distinguishing between these two scenarios, not in the derivation of the approximate nDV itself.
Attachments
Issue Links
- is related to
-
HIVE-9188 BloomFilter support in ORC
- Closed
- relates to
-
HIVE-22993 Include Bloom Filter in Column Statistics to Better Estimate nDV
- Open