Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
Impala 2.11.0
-
ghx-label-1
Description
Incorrect results are returned in some cases where 'nan'/'inf' are the only values in the group and codegen is enabled:
> set DISABLE_CODEGEN_ROWS_THRESHOLD set to 0 > select * from test1 order by col1 +------+-----------+ | col0 | col1 | +------+-----------+ | 0 | NaN | | 2 | -Infinity | | 3 | 0 | | 1 | Infinity | +------+-----------+ > set DISABLE_CODEGEN set to true > select col0, min(col1) from test1 group by col0 order by col0 +------+-----------+ | col0 | min(col1) | +------+-----------+ | 0 | NaN | | 1 | Infinity | | 2 | -Infinity | | 3 | 0 | +------+-----------+ > set DISABLE_CODEGEN set to false > select col0, min(col1) from test1 group by col0 order by col0 +------+------------------------+ | col0 | min(col1) | +------+------------------------+ | 0 | 1.797693134862316e+308 | | 1 | 1.797693134862316e+308 | | 2 | -Infinity | | 3 | 0 | +------+------------------------+ > set DISABLE_CODEGEN set to true > select col0, max(col1) from test1 group by col0 order by col0 +------+-----------+ | col0 | max(col1) | +------+-----------+ | 0 | NaN | | 1 | Infinity | | 2 | -Infinity | | 3 | 0 | +------+-----------+ > set DISABLE_CODEGEN set to false > select col0, max(col1) from test1 group by col0 order by col0 +------+-------------------------+ | col0 | max(col1) | +------+-------------------------+ | 0 | -1.797693134862316e+308 | | 1 | Infinity | | 2 | -1.797693134862316e+308 | | 3 | 0 | +------+-------------------------+
We also appear to never return 'nan' as a min or max value despite sorted it as the lowest value when ordering a table (perhaps this is the intended behavior?):
> set DISABLE_CODEGEN_ROWS_THRESHOLD set to 0 > select * from test2 order by col1 +------+-----------+ | col0 | col1 | +------+-----------+ | 0 | NaN | | 2 | -Infinity | | 0 | 0 | | 3 | 0 | | 1 | 1 | | 2 | 2 | | 3 | 3 | | 1 | Infinity | +------+-----------+ > set DISABLE_CODEGEN set to true > select col0, min(col1) from test2 group by col0 order by col0 +------+-----------+ | col0 | min(col1) | +------+-----------+ | 0 | 0 | | 1 | 1 | | 2 | -Infinity | | 3 | 0 | +------+-----------+ > set DISABLE_CODEGEN set to false > select col0, min(col1) from test2 group by col0 order by col0 +------+-----------+ | col0 | min(col1) | +------+-----------+ | 0 | 0 | | 1 | 1 | | 2 | -Infinity | | 3 | 0 | +------+-----------+ > set DISABLE_CODEGEN set to true > select col0, max(col1) from test2 group by col0 order by col0 +------+-----------+ | col0 | max(col1) | +------+-----------+ | 0 | 0 | | 1 | Infinity | | 2 | 2 | | 3 | 3 | +------+-----------+ > set DISABLE_CODEGEN set to false > select col0, max(col1) from test2 group by col0 order by col0 +------+-----------+ | col0 | max(col1) | +------+-----------+ | 0 | 0 | | 1 | Infinity | | 2 | 2 | | 3 | 3 | +------+-----------+
Changing LlvmCodeGen::CodegenMinMax to use OLT/OGT float comparison functions appears to solve the first case (at least for 'nan'), but leads to us returning 'nan' as a max value in the second case.