Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-35955

Fix decimal overflow issues for Average

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.0.0
    • 3.2.0
    • SQL
    • None

    Description

      Fix decimal overflow issues for decimal average in ANSI mode. Linked to SPARK-32018 and SPARK-28067, which address decimal sum.

      Repro:

       

      import org.apache.spark.sql.functions._
      spark.conf.set("spark.sql.ansi.enabled", true)
      
      val df = Seq(
       (BigDecimal("10000000000000000000"), 1),
       (BigDecimal("10000000000000000000"), 1),
       (BigDecimal("10000000000000000000"), 2),
       (BigDecimal("10000000000000000000"), 2),
       (BigDecimal("10000000000000000000"), 2),
       (BigDecimal("10000000000000000000"), 2),
       (BigDecimal("10000000000000000000"), 2),
       (BigDecimal("10000000000000000000"), 2),
       (BigDecimal("10000000000000000000"), 2),
       (BigDecimal("10000000000000000000"), 2),
       (BigDecimal("10000000000000000000"), 2),
       (BigDecimal("10000000000000000000"), 2)).toDF("decNum", "intNum")
      val df2 = df.withColumnRenamed("decNum", "decNum2").join(df, "intNum").agg(mean("decNum"))
      df2.show(40,false)
      

       

      Should throw an exception (as sum overflows), but instead returns:

       

      +-----------+
      |avg(decNum)|
      +-----------+
      |null       |
      +-----------+

       

      Attachments

        Activity

          People

            karenfeng Karen Feng
            karenfeng Karen Feng
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: