Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-12491

UDAF result differs in SQL if alias is used

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: 1.5.2
    • Fix Version/s: 1.5.3, 1.6.0
    • Component/s: SQL
    • Labels:
      None

      Description

      Using the GeometricMean UDAF example (https://databricks.com/blog/2015/09/16/spark-1-5-dataframe-api-highlights-datetimestring-handling-time-intervals-and-udafs.html), I found the following discrepancy in results:

      scala> sqlContext.sql("select group_id, gm(id) from simple group by group_id").show()
      +--------+---+
      |group_id|_c1|
      +--------+---+
      |       0|0.0|
      |       1|0.0|
      |       2|0.0|
      +--------+---+
      
      
      scala> sqlContext.sql("select group_id, gm(id) as GeometricMean from simple group by group_id").show()
      +--------+-----------------+
      |group_id|    GeometricMean|
      +--------+-----------------+
      |       0|8.981385496571725|
      |       1|7.301716979342118|
      |       2|7.706253151292568|
      +--------+-----------------+
      

        Attachments

        1. UDAF_GM.zip
          2 kB
          Herman van Hövell

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                twietsma Tristan
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: