[DRILL-3087] Union All returns incorrect results. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Critical
Resolution: Fixed
Affects Version/s: 1.0.0
Fix Version/s: 1.0.0
Component/s: Query Planning & Optimization
Labels:
None
Environment:

21cc578b6b8c8f3ca1ebffd3dbb92e35d68bc726

Description

Union All returns incorrect results when the same aggregate + group by query is used on both sides of Union All. Test was run from sqlline on 4 node cluster on CentOS.

Total number of records in the parquet file are,

0: jdbc:drill:schema=dfs.tmp> select count(*) from `twokeyParq_02/0_0_0.parquet`;
+------------+
|   EXPR$0   |
+------------+
| 26212355   |
+------------+
1 row selected (0.17 seconds)

Results returned by aggregate + group by query.

0: jdbc:drill:schema=dfs.tmp> select max(cast(key1 as double)) max_key1, key2 from `twokeyParq_02/0_0_0.parquet` group by key2;
+------------+------------+
|  max_key1  |    key2    |
+------------+------------+
| 1.42931543226E9 | m          |
| 1.42931281008E9 | e          |
| 1.42931334853E9 | k          |
| 1.42931595791E9 | n          |
| 1.42435032101E9 | 1          |
| 1.42931539809E9 | l          |
| 1.42931336919E9 | c          |
| 1.40095133379E9 | 0          |
| 1.42931553374E9 | h          |
| 1.42931539751E9 | j          |
| 1.42931480081E9 | f          |
| 1.42931347924E9 | b          |
| 1.42931509068E9 | g          |
| 1.42931380603E9 | d          |
| 1.42931626355E9 | a          |
| 1.42931450347E9 | i          |
+------------+------------+
16 rows selected (19.262 seconds)

There Union All results are incorrect.

0: jdbc:drill:schema=dfs.tmp> select * from (select max(cast(key1 as double)) max_key1, key2 from `twokeyParq_02/0_0_0.parquet` group by key2 union all select count(cast(key1 as double)) max_key1, key2 from `twokeyParq_02/0_0_0.parquet` group by key2);
+------------+------------+
|  max_key1  |    key2    |
+------------+------------+
| 1.42931380603E9 | d          |
| 1.42931336919E9 | c          |
| 1.42931347924E9 | b          |
| 1.42435032101E9 | 1          |
| 1.42931626355E9 | a          |
| 1.40095133379E9 | 0          |
| 1.42931334853E9 | k          |
| 1.42931543226E9 | m          |
| 1.42931539751E9 | j          |
| 1.42931553374E9 | h          |
| 1.42931281008E9 | e          |
| 1.42931595791E9 | n          |
| 1.42931509068E9 | g          |
| 1.42931480081E9 | f          |
| 1.42931539809E9 | l          |
| 1.42931450347E9 | i          |
| 1870700.0  | d          |
| 1871853.0  | c          |
| 1870682.0  | b          |
| 180.0      | 1          |
| 1872286.0  | a          |
| 156.0      | 0          |
| 1873931.0  | k          |
| 1874177.0  | m          |
| 1872252.0  | j          |
| 1872734.0  | h          |
| 1871510.0  | e          |
| 1874465.0  | n          |
| 1873607.0  | g          |
| 1870453.0  | f          |
| 1870933.0  | l          |
| 1872436.0  | i          |
+------------+------------+
32 rows selected (39.183 seconds)

Physical query plan

0: jdbc:drill:schema=dfs.tmp> explain plan for select * from (select max(cast(key1 as double)) max_key1, key2 from `twokeyParq_02/0_0_0.parquet` group by key2 union all select count(cast(key1 as double)) max_key1, key2 from `twokeyParq_02/0_0_0.parquet` group by key2);
+------------+------------+
|    text    |    json    |
+------------+------------+
| 00-00    Screen
00-01      UnionAll(all=[true])
00-03        Project(max_key1=[$1], key2=[$0])
00-05          HashAgg(group=[{0}], max_key1=[MAX($1)])
00-07            HashAgg(group=[{0}], max_key1=[MAX($1)])
00-09              Project(key2=[$0], $f1=[CAST($1):DOUBLE])
00-11                Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/tmp/twokeyParq_02/0_0_0.parquet]], selectionRoot=/tmp/twokeyParq_02/0_0_0.parquet, numFiles=1, columns=[`key2`, `key1`]]])
00-02        Project(max_key1=[$1], key2=[$0])
00-04          HashAgg(group=[{0}], max_key1=[$SUM0($1)])
00-06            HashAgg(group=[{0}], max_key1=[COUNT($1)])
00-08              Project(key2=[$0], $f1=[CAST($1):DOUBLE])
00-10                Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/tmp/twokeyParq_02/0_0_0.parquet]], selectionRoot=/tmp/twokeyParq_02/0_0_0.parquet, numFiles=1, columns=[`key2`, `key1`]]])

Attachments

Activity

People

Assignee:: Jinfeng Ni

Reporter:: Khurram Faraaz

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 14/May/15 18:36

Updated:: 16/May/15 03:00

Resolved:: 16/May/15 03:00