Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-5511

[SQL] Possible optimisations for predicate pushdowns from Spark SQL to Parquet

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Invalid
    • 1.2.0
    • None
    • SQL
    • None

    Description

      The following changes could make predicate pushdown more effective under certain conditions, which are not uncommon.

      1. Parquet predicate evaluation does not use dictionary compression information, furthermore it circumvents dictionary decoding optimisations (https://issues.apache.org/jira/browse/PARQUET-36). This means predicates are re-evaluated repeatedly for the same Strings, and also Binary->String conversions are repeated. This is a change purely on the Parquet side.

      2. Support IN clauses in predicate pushdown. This requires changes to Parquet and then subsequently in Spark SQL.

      Attachments

        Activity

          People

            Unassigned Unassigned
            Michael Davies Mick Davies
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: