Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-28302

Let SUM UDF return NULL when all rows have non-numeric texts

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Patch Available
    • Major
    • Resolution: Unresolved
    • 4.0.0
    • None
    • UDF

    Description

      The ANSI standard says UDAF should return NULL when all inputs are NULL.

      Hive is very generous and accepts non-numeric texts as input for SUM. To give some consistency to the generous specification, we believe `SUM(string_col)` should behave in the same way as `SUM(CAST(string_col AS DOUBLE))`.

      However, Hive's SUM returns 0.0 in that case.

      > SELECT SUM(CAST(null AS STRING)), SUM('invalid num'), SUM(CAST('invalid num' AS DOUBLE));
      +-------+------+-------+
      |  _c0  | _c1  |  _c2  |
      +-------+------+-------+
      | NULL  | 0.0  | NULL  |
      +-------+------+-------+ 

      We see some more discussions in https://github.com/apache/hive/pull/5091.

      Attachments

        Issue Links

          Activity

            People

              okumin Shohei Okumiya
              okumin Shohei Okumiya
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: