Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-6295

Inconsistent handling of 'nan' and 'inf' with min/max analytic fns

    Details

    • Epic Color:
      ghx-label-1

      Description

      Incorrect results are returned in some cases where 'nan'/'inf' are the only values in the group and codegen is enabled:

      > set DISABLE_CODEGEN_ROWS_THRESHOLD set to 0
      
      > select * from test1 order by col1
      +------+-----------+
      | col0 | col1      |
      +------+-----------+
      | 0    | NaN       |
      | 2    | -Infinity |
      | 3    | 0         |
      | 1    | Infinity  |
      +------+-----------+
      
      > set DISABLE_CODEGEN set to true
      > select col0, min(col1) from test1 group by col0 order by col0
      +------+-----------+
      | col0 | min(col1) |
      +------+-----------+
      | 0    | NaN       |
      | 1    | Infinity  |
      | 2    | -Infinity |
      | 3    | 0         |
      +------+-----------+
      
      > set DISABLE_CODEGEN set to false
      > select col0, min(col1) from test1 group by col0 order by col0
      +------+------------------------+
      | col0 | min(col1)              |
      +------+------------------------+
      | 0    | 1.797693134862316e+308 |
      | 1    | 1.797693134862316e+308 |
      | 2    | -Infinity              |
      | 3    | 0                      |
      +------+------------------------+
      
      > set DISABLE_CODEGEN set to true
      > select col0, max(col1) from test1 group by col0 order by col0
      +------+-----------+
      | col0 | max(col1) |
      +------+-----------+
      | 0    | NaN       |
      | 1    | Infinity  |
      | 2    | -Infinity |
      | 3    | 0         |
      +------+-----------+
      
      > set DISABLE_CODEGEN set to false
      > select col0, max(col1) from test1 group by col0 order by col0
      +------+-------------------------+
      | col0 | max(col1)               |
      +------+-------------------------+
      | 0    | -1.797693134862316e+308 |
      | 1    | Infinity                |
      | 2    | -1.797693134862316e+308 |
      | 3    | 0                       |
      +------+-------------------------+
      

      We also appear to never return 'nan' as a min or max value despite sorted it as the lowest value when ordering a table (perhaps this is the intended behavior?):

      > set DISABLE_CODEGEN_ROWS_THRESHOLD set to 0
      > select * from test2 order by col1
      +------+-----------+
      | col0 | col1      |
      +------+-----------+
      | 0    | NaN       |
      | 2    | -Infinity |
      | 0    | 0         |
      | 3    | 0         |
      | 1    | 1         |
      | 2    | 2         |
      | 3    | 3         |
      | 1    | Infinity  |
      +------+-----------+
      
      > set DISABLE_CODEGEN set to true
      > select col0, min(col1) from test2 group by col0 order by col0
      +------+-----------+
      | col0 | min(col1) |
      +------+-----------+
      | 0    | 0         |
      | 1    | 1         |
      | 2    | -Infinity |
      | 3    | 0         |
      +------+-----------+
      
      > set DISABLE_CODEGEN set to false
      > select col0, min(col1) from test2 group by col0 order by col0
      +------+-----------+
      | col0 | min(col1) |
      +------+-----------+
      | 0    | 0         |
      | 1    | 1         |
      | 2    | -Infinity |
      | 3    | 0         |
      +------+-----------+
      
      > set DISABLE_CODEGEN set to true
      > select col0, max(col1) from test2 group by col0 order by col0
      +------+-----------+
      | col0 | max(col1) |
      +------+-----------+
      | 0    | 0         |
      | 1    | Infinity  |
      | 2    | 2         |
      | 3    | 3         |
      +------+-----------+
      
      > set DISABLE_CODEGEN set to false
      > select col0, max(col1) from test2 group by col0 order by col0
      +------+-----------+
      | col0 | max(col1) |
      +------+-----------+
      | 0    | 0         |
      | 1    | Infinity  |
      | 2    | 2         |
      | 3    | 3         |
      +------+-----------+
      

      Changing LlvmCodeGen::CodegenMinMax to use OLT/OGT float comparison functions appears to solve the first case (at least for 'nan'), but leads to us returning 'nan' as a max value in the second case.

        Attachments

          Activity

            People

            • Assignee:
              twmarshall Thomas Tauber-Marshall
              Reporter:
              twmarshall Thomas Tauber-Marshall
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: