Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-32694

Pushdown cast to data sources

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 3.0.0
    • None
    • SQL
    • None

    Description

      Currently we don't support pushing down cast to data source (see link for a discussion). For instance, in the following code snippet:

      scala> case class Person(name: String, age: Short)
      scala> Seq(Person("John", 32), Person("David", 25), Person("Mike", 18)).toDS().write.parquet("/tmp/person.parquet")
      scala> val personDS = spark.read.parquet("/tmp/person.parquet")
      scala> personDS.createOrReplaceTempView("person")
      scala> spark.sql("SELECT * FROM person where age < 30")
      

      The predicate won't be pushed down to Parquet data source because in DataSourceStrategy, PushableColumnBase only handles a few limited cases such as Attribute and GetStructField. Potentially we can handle Cast here as well.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              csun Chao Sun
              Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: