Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
3.0.0
-
None
-
None
Description
Currently we don't support pushing down cast to data source (see link for a discussion). For instance, in the following code snippet:
scala> case class Person(name: String, age: Short) scala> Seq(Person("John", 32), Person("David", 25), Person("Mike", 18)).toDS().write.parquet("/tmp/person.parquet") scala> val personDS = spark.read.parquet("/tmp/person.parquet") scala> personDS.createOrReplaceTempView("person") scala> spark.sql("SELECT * FROM person where age < 30")
The predicate won't be pushed down to Parquet data source because in DataSourceStrategy, PushableColumnBase only handles a few limited cases such as Attribute and GetStructField. Potentially we can handle Cast here as well.
Attachments
Issue Links
- is duplicated by
-
SPARK-24994 Add UnwrapCastInBinaryComparison optimizer to simplify literal types
- Resolved