Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-1659

Spark does not remove pushed predicates from Spark-side query plan

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Invalid
    • 1.0.0
    • n/a
    • perf, spark
    • None

    Description

      I ran the following Spark SQL query:

      select count(*) from metrics where host = "foo.com"
      

      I verified that this resulted in the predicate being pushed to Kudu. Once the predicate is pushed, it's not necessary to evaluate again on the Spark side, and in fact Spark doesn't need to select the column at all. However, Spark appears to still be selecting the column and re-evaluating the same filter:

      == Physical Plan ==
      TungstenAggregate(key=[], functions=[(count(1),mode=Final,isDistinct=false)], output=[_c0#71L])
       TungstenExchange SinglePartition
        TungstenAggregate(key=[], functions=[(count(1),mode=Partial,isDistinct=false)], output=[currentCount#74L])
         Project
          Filter (host#0 = foo.com)
           Scan org.apache.kudu.spark.kudu.KuduRelation@1d18e5ad[host#0]
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            tlipcon Todd Lipcon
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: