PARQUET-41 has been closed recently. This means Parquet-MR is capable of writing and reading bloom filters.
Currently bloom filters are per column chunk entries, i.e. with their help we can filter out entire row groups.
We already filter row groups in HdfsParquetScanner::NextRowGroup() based on column chunk statistics and dictionaries. Skipping row groups based on bloom filters could be also added to this funciton.
Impala could also write bloom filters.
- is related to
HIVE-24831 Support writing bloom filters in Parquet
SPARK-34562 Leverage parquet bloom filters
- relates to
IMPALA-10898 Runtime IN-list filters for ORC tables