Description
I ran the following Spark SQL query:
select count(*) from metrics where host = "foo.com"
I verified that this resulted in the predicate being pushed to Kudu. Once the predicate is pushed, it's not necessary to evaluate again on the Spark side, and in fact Spark doesn't need to select the column at all. However, Spark appears to still be selecting the column and re-evaluating the same filter:
== Physical Plan == TungstenAggregate(key=[], functions=[(count(1),mode=Final,isDistinct=false)], output=[_c0#71L]) TungstenExchange SinglePartition TungstenAggregate(key=[], functions=[(count(1),mode=Partial,isDistinct=false)], output=[currentCount#74L]) Project Filter (host#0 = foo.com) Scan org.apache.kudu.spark.kudu.KuduRelation@1d18e5ad[host#0]