Details
-
New Feature
-
Status: Closed
-
Major
-
Resolution: Fixed
-
0.15.0
-
Support for Bloom Filters in ORC internal index.
Description
BloomFilters are well known probabilistic data structure for set membership checking. We can use bloom filters in ORC index for better row group pruning. Currently, ORC row group index uses min/max statistics to eliminate row groups (stripes as well) that do not satisfy predicate condition specified in the query. But in some cases, the efficiency of min/max based elimination is not optimal (unsorted columns with wide range of entries). Bloom filters can be an effective and efficient alternative for row group/split elimination for point queries or queries with IN clause.
Attachments
Attachments
Issue Links
- is blocked by
-
HIVE-4639 Add has null flag to ORC internal index
- Resolved
- relates to
-
HIVE-11033 BloomFilter index is not honored by ORC reader
- Resolved
-
HIVE-9931 Approximate nDV statistics from ORC bloom filter population
- Open
- links to