Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-25419 Parquet predicate pushdown improvement
  3. SPARK-25207

Case-insensitve field resolution for filter pushdown when reading Parquet

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.4.0
    • 2.4.0
    • SQL

    Description

      Currently, filter pushdown will not work if Parquet schema and Hive metastore schema are in different letter cases even spark.sql.caseSensitive is false.

      Like the below case:

      spark.range(10).write.parquet("/tmp/data")
      sql("DROP TABLE t")
      sql("CREATE TABLE t (ID LONG) USING parquet LOCATION '/tmp/data'")
      sql("select * from t where id > 0").show

      No filter will be pushed down.

      scala> sql("select * from t where id > 0").explain   // Filters are pushed with `ID`
      == Physical Plan ==
      *(1) Project [ID#90L]
      +- *(1) Filter (isnotnull(id#90L) && (id#90L > 0))
         +- *(1) FileScan parquet default.t[ID#90L] Batched: true, Format: Parquet, Location: InMemoryFileIndex[file:/tmp/data], PartitionFilters: [], PushedFilters: [IsNotNull(ID), GreaterThan(ID,0)], ReadSchema: struct<ID:bigint>
      
      scala> sql("select * from t").show    // Parquet returns NULL for `ID` because it has `id`.
      +----+
      |  ID|
      +----+
      |null|
      |null|
      |null|
      |null|
      |null|
      |null|
      |null|
      |null|
      |null|
      |null|
      +----+
      
      scala> sql("select * from t where id > 0").show   // `NULL > 0` is `false`.
      +---+
      | ID|
      +---+
      +---+
      

      Attachments

        1. image.png
          41 kB
          Dongjoon Hyun

        Issue Links

          Activity

            People

              yucai yucai
              yucai yucai
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: