Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-26865

DataSourceV2Strategy should push normalized filters

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.0.0
    • 3.0.0
    • SQL
    • None

    Description

      Although we designed `SupportsPushDownFilters` in the same way by using `Filter`. DSv1 and DSv2 passes different filters.

        /**
         * Pushes down filters, and returns filters that need to be evaluated after scanning.
         */
        Filter[] pushFilters(Filter[] filters);
      

      Specifically, DSv2 doesn't guarantee that filter expressions match the underlying schema in terms of case-sensitivity.

      buildReaderWithPartitionValues(..., filters: Seq[Filter], ...)
      - IsNotNull(ID)
      
      DataSourceV2Strategy.pushFilters
      - IsNotNull(id)
      

      steps to reproduce:

      spark.range(10).write.orc("/tmp/o1")
      spark.read.schema("ID long").orc("/tmp/o1").filter("id > 5").show
      
      java.util.NoSuchElementException: key not found: id
        at scala.collection.immutable.Map$Map1.apply(Map.scala:114)
        at org.apache.spark.sql.execution.datasources.orc.OrcFilters$.createBuilder(OrcFilters.scala:263)
        at org.apache.spark.sql.execution.datasources.orc.OrcFilters$.buildSearchArgument(OrcFilters.scala:153)
        at org.apache.spark.sql.execution.datasources.orc.OrcFilters$.$anonfun$convertibleFilters$1(OrcFilters.scala:99)
        at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:244)
        at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
        at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
        at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:39)
        at scala.collection.TraversableLike.flatMap(TraversableLike.scala:244)
        at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:241)
        at scala.collection.AbstractTraversable.flatMap(Traversable.scala:108)
        at org.apache.spark.sql.execution.datasources.orc.OrcFilters$.convertibleFilters(OrcFilters.scala:98)
        at org.apache.spark.sql.execution.datasources.orc.OrcFilters$.createFilter(OrcFilters.scala:87)
        at org.apache.spark.sql.execution.datasources.v2.orc.OrcScanBuilder.pushFilters(OrcScanBuilder.scala:50)
      

      Attachments

        Issue Links

          Activity

            People

              dongjoon Dongjoon Hyun
              cloud_fan Wenchen Fan
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: