We can improve the performance of some joins by pre-filtering one side of a join using a Bloom filter and IN predicate generated from the values from the other side of the join.
For example:tpcds/q16.sql. Before this optimization. After this optimization.
Query Performance Benchmarks: TPC-DS Performance Evaluation
Our setup for running TPC-DS benchmark was as follows: TPC-DS 5T and Partitioned Parquet table
|Query||Default(Seconds)||Enable Bloom Filter Join(Seconds)|