Details
Description
PARQUET-41 has been closed recently. This means Parquet-MR is capable of writing and reading bloom filters.
Currently bloom filters are per column chunk entries, i.e. with their help we can filter out entire row groups.
We already filter row groups in HdfsParquetScanner::NextRowGroup() based on column chunk statistics and dictionaries. Skipping row groups based on bloom filters could be also added to this funciton.
Impala could also write bloom filters.
Attachments
Issue Links
- is related to
-
HIVE-24831 Support writing bloom filters in Parquet
- Open
-
SPARK-34562 Leverage parquet bloom filters
- Resolved
- relates to
-
IMPALA-10898 Runtime IN-list filters for ORC tables
- Resolved