Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-41636

DataSourceStrategy#selectFilters returns predicates in non-deterministic order

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.1.0, 3.4.0, 3.4.1
    • 3.5.0, 4.0.0
    • SQL
    • None

    Description

      Method org.apache.spark.sql.execution.datasources.DataSourceStrategy#selectFilters, which is used to determine "pushdown-able" filters, does not preserve the order of the input Seq[Expression] nor does it return the same order across the same plans (modulo ExprId differences). This is resulting in CodeGenerator cache misses even when the exact same LogicalPlan is executed. 

      The aforementioned method does not attempt to maintain the order of the input predicates, though it happens to do so when there are less than 5 pushdown-able Expression in the input (due to some "small maps" logic in scala.collection.TraversableOnce#toMap). 

      Returning in the same order as the input will reduce churn on the CodeGenerator cache under prolonged workloads that execute queries that are very similar. 

      Attachments

        Activity

          People

            fanjia Jia Fan
            jwserencsa Jonny Serencsa
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: