Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-4366 Aggregation Improvement
  3. SPARK-9257

Fix the false negative of Aggregate2Sort and FinalAndCompleteAggregate2Sort's missingInput

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • None
    • 1.5.0
    • SQL
    • None

    Description

      sqlContext.sql(
         """
        |SELECT sum(value)
        |FROM agg1
        |GROUP BY key
        """.stripMargin).explain()
      
      == Physical Plan ==
      Aggregate2Sort Some(List(key#510)), [key#510], [(sum(CAST(value#511, LongType))2,mode=Final,isDistinct=false)], [sum(CAST(value#511, LongType))#1435L], [sum(CAST(value#511, LongType))#1435L AS _c0#1426L]
       ExternalSort [key#510 ASC], false
        Exchange hashpartitioning(key#510)
         Aggregate2Sort None, [key#510], [(sum(CAST(value#511, LongType))2,mode=Partial,isDistinct=false)], [currentSum#1433L], [key#510,currentSum#1433L]
          ExternalSort [key#510 ASC], false
           PhysicalRDD [key#510,value#511], MapPartitionsRDD[97] at apply at Transformer.scala:22
      
      sqlContext.sql(
        """
        |SELECT sum(distinct value)
        |FROM agg1
        |GROUP BY key
        """.stripMargin).explain()
      
      == Physical Plan ==
      !FinalAndCompleteAggregate2Sort [key#510,CAST(value#511, LongType)#1446L], [key#510], [(sum(CAST(value#511, LongType)#1446L)2,mode=Complete,isDistinct=false)], [sum(CAST(value#511, LongType))#1445L], [sum(CAST(value#511, LongType))#1445L AS _c0#1438L]
       Aggregate2Sort Some(List(key#510)), [key#510,CAST(value#511, LongType)#1446L], [key#510,CAST(value#511, LongType)#1446L]
        ExternalSort [key#510 ASC,CAST(value#511, LongType)#1446L ASC], false
         Exchange hashpartitioning(key#510)
          !Aggregate2Sort None, [key#510,CAST(value#511, LongType) AS CAST(value#511, LongType)#1446L], [key#510,CAST(value#511, LongType)#1446L]
           ExternalSort [key#510 ASC,CAST(value#511, LongType) AS CAST(value#511, LongType)#1446L ASC], false
            PhysicalRDD [key#510,value#511], MapPartitionsRDD[102] at apply at Transformer.scala:22
      

      For examples shown above, you can see there is a ! at the bingeing of the operator's simpleString), which indicates that its missingInput is not empty. Actually, it is a false negative and we need to fix it.

      Also, it will be good to make these two operators' simpleString more reader friendly (people can tell what are grouping expressions, what are aggregate functions, and what is the mode of an aggregate function).

      Attachments

        Activity

          People

            yhuai Yin Huai
            yhuai Yin Huai
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: