Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-25556 Predicate Pushdown for Nested fields
  3. SPARK-17636

Parquet predicate pushdown for nested fields

    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 1.6.2, 1.6.3, 2.0.2
    • Fix Version/s: 3.0.0
    • Component/s: Spark Core, SQL
    • Labels:
      None

      Description

      There's a PushedFilters for a simple numeric field, but not for a numeric field inside a struct. Not sure if this is a Spark limitation because of Parquet, or only a Spark limitation.

      scala> hc.read.parquet("s3a://some/parquet/file").select("day_timestamp", "sale_id")
      
      res5: org.apache.spark.sql.DataFrame = [day_timestamp: struct<timestamp:bigint,timezone:string>, sale_id: bigint]
      
      scala> res5.filter("sale_id > 4").queryExecution.executedPlan
      
      res9: org.apache.spark.sql.execution.SparkPlan =
      Filter[23814] [args=(sale_id#86324L > 4)][outPart=UnknownPartitioning(0)][outOrder=List()]
      +- Scan ParquetRelation[day_timestamp#86302,sale_id#86324L] InputPaths: s3a://some/parquet/file, PushedFilters: [GreaterThan(sale_id,4)]
      
      scala> res5.filter("day_timestamp.timestamp > 4").queryExecution.executedPlan
      
      res10: org.apache.spark.sql.execution.SparkPlan =
      Filter[23815] [args=(day_timestamp#86302.timestamp > 4)][outPart=UnknownPartitioning(0)][outOrder=List()]
      +- Scan ParquetRelation[day_timestamp#86302,sale_id#86324L] InputPaths: s3a://some/parquet/file
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                dbtsai DB Tsai
                Reporter:
                MasterDDT Mitesh
              • Votes:
                30 Vote for this issue
                Watchers:
                59 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: