-
Type:
Sub-task
-
Status: Resolved
-
Priority:
Critical
-
Resolution: Fixed
-
Affects Version/s: 1.6.2, 1.6.3, 2.0.2
-
Fix Version/s: 3.0.0
-
Component/s: Spark Core, SQL
-
Labels:None
There's a PushedFilters for a simple numeric field, but not for a numeric field inside a struct. Not sure if this is a Spark limitation because of Parquet, or only a Spark limitation.
scala> hc.read.parquet("s3a://some/parquet/file").select("day_timestamp", "sale_id") res5: org.apache.spark.sql.DataFrame = [day_timestamp: struct<timestamp:bigint,timezone:string>, sale_id: bigint] scala> res5.filter("sale_id > 4").queryExecution.executedPlan res9: org.apache.spark.sql.execution.SparkPlan = Filter[23814] [args=(sale_id#86324L > 4)][outPart=UnknownPartitioning(0)][outOrder=List()] +- Scan ParquetRelation[day_timestamp#86302,sale_id#86324L] InputPaths: s3a://some/parquet/file, PushedFilters: [GreaterThan(sale_id,4)] scala> res5.filter("day_timestamp.timestamp > 4").queryExecution.executedPlan res10: org.apache.spark.sql.execution.SparkPlan = Filter[23815] [args=(day_timestamp#86302.timestamp > 4)][outPart=UnknownPartitioning(0)][outOrder=List()] +- Scan ParquetRelation[day_timestamp#86302,sale_id#86324L] InputPaths: s3a://some/parquet/file
- fixes
-
SPARK-25558 Pushdown predicates for nested fields in DataSource Strategy
-
- Resolved
-
- is duplicated by
-
SPARK-5151 Parquet Predicate Pushdown Does Not Work with Nested Structures.
-
- Resolved
-
-
SPARK-19638 Filter pushdown not working for struct fields
-
- Closed
-
- links to
(4 links to)