Details
-
Bug
-
Status: Closed
-
Critical
-
Resolution: Fixed
-
1.0.0
-
None
-
21cc578b6b8c8f3ca1ebffd3dbb92e35d68bc726
Description
Union All returns incorrect results when the same aggregate + group by query is used on both sides of Union All. Test was run from sqlline on 4 node cluster on CentOS.
Total number of records in the parquet file are,
0: jdbc:drill:schema=dfs.tmp> select count(*) from `twokeyParq_02/0_0_0.parquet`; +------------+ | EXPR$0 | +------------+ | 26212355 | +------------+ 1 row selected (0.17 seconds)
Results returned by aggregate + group by query.
0: jdbc:drill:schema=dfs.tmp> select max(cast(key1 as double)) max_key1, key2 from `twokeyParq_02/0_0_0.parquet` group by key2; +------------+------------+ | max_key1 | key2 | +------------+------------+ | 1.42931543226E9 | m | | 1.42931281008E9 | e | | 1.42931334853E9 | k | | 1.42931595791E9 | n | | 1.42435032101E9 | 1 | | 1.42931539809E9 | l | | 1.42931336919E9 | c | | 1.40095133379E9 | 0 | | 1.42931553374E9 | h | | 1.42931539751E9 | j | | 1.42931480081E9 | f | | 1.42931347924E9 | b | | 1.42931509068E9 | g | | 1.42931380603E9 | d | | 1.42931626355E9 | a | | 1.42931450347E9 | i | +------------+------------+ 16 rows selected (19.262 seconds)
There Union All results are incorrect.
0: jdbc:drill:schema=dfs.tmp> select * from (select max(cast(key1 as double)) max_key1, key2 from `twokeyParq_02/0_0_0.parquet` group by key2 union all select count(cast(key1 as double)) max_key1, key2 from `twokeyParq_02/0_0_0.parquet` group by key2); +------------+------------+ | max_key1 | key2 | +------------+------------+ | 1.42931380603E9 | d | | 1.42931336919E9 | c | | 1.42931347924E9 | b | | 1.42435032101E9 | 1 | | 1.42931626355E9 | a | | 1.40095133379E9 | 0 | | 1.42931334853E9 | k | | 1.42931543226E9 | m | | 1.42931539751E9 | j | | 1.42931553374E9 | h | | 1.42931281008E9 | e | | 1.42931595791E9 | n | | 1.42931509068E9 | g | | 1.42931480081E9 | f | | 1.42931539809E9 | l | | 1.42931450347E9 | i | | 1870700.0 | d | | 1871853.0 | c | | 1870682.0 | b | | 180.0 | 1 | | 1872286.0 | a | | 156.0 | 0 | | 1873931.0 | k | | 1874177.0 | m | | 1872252.0 | j | | 1872734.0 | h | | 1871510.0 | e | | 1874465.0 | n | | 1873607.0 | g | | 1870453.0 | f | | 1870933.0 | l | | 1872436.0 | i | +------------+------------+ 32 rows selected (39.183 seconds)
Physical query plan
0: jdbc:drill:schema=dfs.tmp> explain plan for select * from (select max(cast(key1 as double)) max_key1, key2 from `twokeyParq_02/0_0_0.parquet` group by key2 union all select count(cast(key1 as double)) max_key1, key2 from `twokeyParq_02/0_0_0.parquet` group by key2); +------------+------------+ | text | json | +------------+------------+ | 00-00 Screen 00-01 UnionAll(all=[true]) 00-03 Project(max_key1=[$1], key2=[$0]) 00-05 HashAgg(group=[{0}], max_key1=[MAX($1)]) 00-07 HashAgg(group=[{0}], max_key1=[MAX($1)]) 00-09 Project(key2=[$0], $f1=[CAST($1):DOUBLE]) 00-11 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/tmp/twokeyParq_02/0_0_0.parquet]], selectionRoot=/tmp/twokeyParq_02/0_0_0.parquet, numFiles=1, columns=[`key2`, `key1`]]]) 00-02 Project(max_key1=[$1], key2=[$0]) 00-04 HashAgg(group=[{0}], max_key1=[$SUM0($1)]) 00-06 HashAgg(group=[{0}], max_key1=[COUNT($1)]) 00-08 Project(key2=[$0], $f1=[CAST($1):DOUBLE]) 00-10 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/tmp/twokeyParq_02/0_0_0.parquet]], selectionRoot=/tmp/twokeyParq_02/0_0_0.parquet, numFiles=1, columns=[`key2`, `key1`]]])