Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Invalid
-
1.2.0
-
None
-
None
Description
The following changes could make predicate pushdown more effective under certain conditions, which are not uncommon.
1. Parquet predicate evaluation does not use dictionary compression information, furthermore it circumvents dictionary decoding optimisations (https://issues.apache.org/jira/browse/PARQUET-36). This means predicates are re-evaluated repeatedly for the same Strings, and also Binary->String conversions are repeated. This is a change purely on the Parquet side.
2. Support IN clauses in predicate pushdown. This requires changes to Parquet and then subsequently in Spark SQL.