Description
sqlContext.sql( """ |SELECT sum(value) |FROM agg1 |GROUP BY key """.stripMargin).explain() == Physical Plan == Aggregate2Sort Some(List(key#510)), [key#510], [(sum(CAST(value#511, LongType))2,mode=Final,isDistinct=false)], [sum(CAST(value#511, LongType))#1435L], [sum(CAST(value#511, LongType))#1435L AS _c0#1426L] ExternalSort [key#510 ASC], false Exchange hashpartitioning(key#510) Aggregate2Sort None, [key#510], [(sum(CAST(value#511, LongType))2,mode=Partial,isDistinct=false)], [currentSum#1433L], [key#510,currentSum#1433L] ExternalSort [key#510 ASC], false PhysicalRDD [key#510,value#511], MapPartitionsRDD[97] at apply at Transformer.scala:22 sqlContext.sql( """ |SELECT sum(distinct value) |FROM agg1 |GROUP BY key """.stripMargin).explain() == Physical Plan == !FinalAndCompleteAggregate2Sort [key#510,CAST(value#511, LongType)#1446L], [key#510], [(sum(CAST(value#511, LongType)#1446L)2,mode=Complete,isDistinct=false)], [sum(CAST(value#511, LongType))#1445L], [sum(CAST(value#511, LongType))#1445L AS _c0#1438L] Aggregate2Sort Some(List(key#510)), [key#510,CAST(value#511, LongType)#1446L], [key#510,CAST(value#511, LongType)#1446L] ExternalSort [key#510 ASC,CAST(value#511, LongType)#1446L ASC], false Exchange hashpartitioning(key#510) !Aggregate2Sort None, [key#510,CAST(value#511, LongType) AS CAST(value#511, LongType)#1446L], [key#510,CAST(value#511, LongType)#1446L] ExternalSort [key#510 ASC,CAST(value#511, LongType) AS CAST(value#511, LongType)#1446L ASC], false PhysicalRDD [key#510,value#511], MapPartitionsRDD[102] at apply at Transformer.scala:22
For examples shown above, you can see there is a ! at the bingeing of the operator's simpleString), which indicates that its missingInput is not empty. Actually, it is a false negative and we need to fix it.
Also, it will be good to make these two operators' simpleString more reader friendly (people can tell what are grouping expressions, what are aggregate functions, and what is the mode of an aggregate function).