Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-22271

Describe results in "null" for the value of "mean" of a numeric variable

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 2.1.0
    • 2.2.1, 2.3.0
    • SQL
    • None

    Description

      Please excuse me if this issue was addressed already - I was unable to find it.

      Calling .describe().show() on my dataframe results in a value of null for the row "mean":

      val foo = spark.read.parquet("decimalNumbers.parquet")        
      foo.select(col("numericvariable")).describe().show()
      
      foo: org.apache.spark.sql.DataFrame = [numericvariable: decimal(38,32)]
      +-------+--------------------+
      |summary|     numericvariable|
      +-------+--------------------+
      |  count|                 299|
      |   mean|                null|
      | stddev|  0.2376438793946738|
      |    min|0.037815489727642...|
      |    max|2.138189366554511...|
      

      But all of the rows for this seem ok (I can attache a parquet file). When I round the column, however, all is fine:

      foo.select(bround(col("numericvariable"), 31)).describe().show()
      
      
      +-------+---------------------------+
      |summary|bround(numericvariable, 31)|
      +-------+---------------------------+
      |  count|                        299|
      |   mean|       0.139522503183236...|
      | stddev|         0.2376438793946738|
      |    min|       0.037815489727642...|
      |    max|       2.138189366554511...|
      +-------+---------------------------+
      
      

      Rounding using 32 gives null also though.

      Attachments

        1. decimalNumbers.zip
          185 kB
          Shafique Jamal

        Activity

          People

            huaxing Huaxin Gao
            shafiquejamal Shafique Jamal
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: