[KUDU-1659] Spark does not remove pushed predicates from Spark-side query plan - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Invalid
Affects Version/s: 1.0.0
Fix Version/s: n/a
Component/s: perf, spark
Labels:
None

Description

I ran the following Spark SQL query:

select count(*) from metrics where host = "foo.com"

I verified that this resulted in the predicate being pushed to Kudu. Once the predicate is pushed, it's not necessary to evaluate again on the Spark side, and in fact Spark doesn't need to select the column at all. However, Spark appears to still be selecting the column and re-evaluating the same filter:

== Physical Plan ==
TungstenAggregate(key=[], functions=[(count(1),mode=Final,isDistinct=false)], output=[_c0#71L])
 TungstenExchange SinglePartition
  TungstenAggregate(key=[], functions=[(count(1),mode=Partial,isDistinct=false)], output=[currentCount#74L])
   Project
    Filter (host#0 = foo.com)
     Scan org.apache.kudu.spark.kudu.KuduRelation@1d18e5ad[host#0]

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Todd Lipcon

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 28/Sep/16 20:55

Updated:: 29/Sep/16 05:05

Resolved:: 29/Sep/16 05:05