Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-6120

NaN/Infinity: some functions don't work as expected

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 1.13.0
    • Fix Version/s: None
    • Component/s: Storage - JSON
    • Labels:
      None

      Description

      AFFECTED_VERSION: drill-1.13.0-SNAPSHOT

      AFFECTED_FUNCTIONS:

      • sqrt
      • trunc
      • min/max

      ISSUE_DESCRIPTION: According to DRILL-5919, new json number literals were added: NaN, Infinity, -Infinity. The new data types must be processed properly by existing functions. There are a few issues:
      1. SQRT function. Run the following test query: {code}select  sqrt(Nan) NaN, sqrt(Positive_Infinity) POS_INF, sqrt(Negative_Infinity) NEG_INF  from dfs.tmp.`PN_Inf_NaN.json`{code}

      • EXPECTED_RESULT: it was expected to get the following result: NaN, Infinity, NaN (expected result is based on java Math.sqrt() method)
      • ACTUAL_RESULT: the test query returned: NaN, Infinity, Infinity

      2. TRUNC function. According to DRILL docs (https://drill.apache.org/docs/math-and-trig/): TRUNC(x, y) : Truncates x to y decimal places. Specifying y is optional. Default is 1. So, the function must work properly without specifying y
      However an error message appears. Run test_query: {code}select  trunc(Nan) NaN, trunc(Positive_Infinity) POS_INF, trunc(Negative_Infinity) NEG_INF  from dfs.tmp.`PN_Inf_NaN.json`{code}

      • EXPECTED_RESULT: it was expected to get the following result NaN, NaN, NaN
      • ACTUAL_RESULT: it appears the following error message: Query Failed: An Error Occurred org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: NumberFormatException Fragment 0:0 [Error Id: 95e01fee-7433-4b0b-b913-32358b4a8f55 on node1:31010]

      3. MIN / MAX functions
      ISSUE_DESCRIPTION: min/max aggregation functions return the same result if the selected column contains NaN value.

      SELECT name, max(attr4), min(attr4) from dfs.tmp.`minMax.json` group by name

      Result

      name	Min	Max
      obj1	NaN	NaN
      obj2	NaN	NaN
      obj3	NaN	NaN
      obj4	NaN	NaN
      

      As for me, this logic should be revised, current behavior is a bit confusing:

      • Postgres considers NaN to be the largest value, so MAX(col_withNaN) will return NaN, MIN(col_withNaN) - will return other value, so DRILL's min/max logic can be adjusted to the Postgres' one, see postgres.jpg
      • Or NAN can behave like NULL - DRILL's MIN/MAX functions ignore NULLs

        Attachments

        1. postgres.jpg
          3 kB
          Alexander Malashevsky
        2. PN_Inf_NaN.json
          0.5 kB
          Alexander Malashevsky
        3. minMax.json
          1 kB
          Alexander Malashevsky

          Issue Links

            Activity

              People

              • Assignee:
                volodymyr.tkach Volodymyr Tkach
                Reporter:
                malexx8 Alexander Malashevsky
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated: