Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-12491

UDAF result differs in SQL if alias is used

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 1.5.2
    • 1.5.3, 1.6.0
    • SQL
    • None

    Description

      Using the GeometricMean UDAF example (https://databricks.com/blog/2015/09/16/spark-1-5-dataframe-api-highlights-datetimestring-handling-time-intervals-and-udafs.html), I found the following discrepancy in results:

      scala> sqlContext.sql("select group_id, gm(id) from simple group by group_id").show()
      +--------+---+
      |group_id|_c1|
      +--------+---+
      |       0|0.0|
      |       1|0.0|
      |       2|0.0|
      +--------+---+
      
      
      scala> sqlContext.sql("select group_id, gm(id) as GeometricMean from simple group by group_id").show()
      +--------+-----------------+
      |group_id|    GeometricMean|
      +--------+-----------------+
      |       0|8.981385496571725|
      |       1|7.301716979342118|
      |       2|7.706253151292568|
      +--------+-----------------+
      

      Attachments

        1. UDAF_GM.zip
          2 kB
          Herman van Hövell

        Issue Links

          Activity

            People

              Unassigned Unassigned
              twietsma Tristan
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: