Description
Although we designed `SupportsPushDownFilters` in the same way by using `Filter`. DSv1 and DSv2 passes different filters.
/** * Pushes down filters, and returns filters that need to be evaluated after scanning. */ Filter[] pushFilters(Filter[] filters);
Specifically, DSv2 doesn't guarantee that filter expressions match the underlying schema in terms of case-sensitivity.
buildReaderWithPartitionValues(..., filters: Seq[Filter], ...) - IsNotNull(ID) DataSourceV2Strategy.pushFilters - IsNotNull(id)
steps to reproduce:
spark.range(10).write.orc("/tmp/o1") spark.read.schema("ID long").orc("/tmp/o1").filter("id > 5").show java.util.NoSuchElementException: key not found: id at scala.collection.immutable.Map$Map1.apply(Map.scala:114) at org.apache.spark.sql.execution.datasources.orc.OrcFilters$.createBuilder(OrcFilters.scala:263) at org.apache.spark.sql.execution.datasources.orc.OrcFilters$.buildSearchArgument(OrcFilters.scala:153) at org.apache.spark.sql.execution.datasources.orc.OrcFilters$.$anonfun$convertibleFilters$1(OrcFilters.scala:99) at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:244) at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:39) at scala.collection.TraversableLike.flatMap(TraversableLike.scala:244) at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:241) at scala.collection.AbstractTraversable.flatMap(Traversable.scala:108) at org.apache.spark.sql.execution.datasources.orc.OrcFilters$.convertibleFilters(OrcFilters.scala:98) at org.apache.spark.sql.execution.datasources.orc.OrcFilters$.createFilter(OrcFilters.scala:87) at org.apache.spark.sql.execution.datasources.v2.orc.OrcScanBuilder.pushFilters(OrcScanBuilder.scala:50)
Attachments
Issue Links
- is caused by
-
SPARK-23817 Create file source V2 framework and migrate ORC read path
- Resolved
- links to