Details
-
Bug
-
Status: Closed
-
Critical
-
Resolution: Fixed
-
0.8.0
-
None
Description
Jan 10 01:17:13 root@~ ] # hadoop fs -cat /test/t.json { "a1": 0, "b1": 0, "c1": "true", "d1" : "2015-01-02"} { "a1": 0, "b1": 0, "c1": "false" , "d1" : "2015-01-03"} { "a1": 0, "b1": 0, "c1": "false" , "d1" : "2015-01-04"} { "a1": 1, "b1": 1, "c1": "true" , "d1" : "2015-01-05"} { "a1": 1, "b1": 1, "c1": "true" , "d1" : "2015-01-06"}
Query below returns wrong result, we should have 4 groups, instead we get only two groups from right side of the union.
Notice lack of project step above union all (unclear if it has anything to to with the bug)
0: jdbc:drill:schema=dfs> select calc1, max(a1), min(b1), count(c1) from (select a1+10 as calc1, a1, b1, c1 from `t.json` union all select a1+100 as calc1, a1, b1, c1 from `t.json`) group by calc1;
+------------+------------+------------+------------+
| calc1 | EXPR$1 | EXPR$2 | EXPR$3 |
+------------+------------+------------+------------+
| 100 | 0 | 0 | 3 |
| 101 | 1 | 1 | 2 |
+------------+------------+------------+------------+
2 rows selected (0.181 seconds)
00-01 Project(calc1=[$0], EXPR$1=[$1], EXPR$2=[$2], EXPR$3=[$3])
00-02 HashAgg(group=[{0}], EXPR$1=[MAX($1)], EXPR$2=[MIN($2)], EXPR$3=[COUNT($3)])
00-03 UnionAll(all=[true])
00-05 Project(calc1=[+($2, 10)], a1=[$2], b1=[$1], c1=[$0])
00-07 Scan(groupscan=[EasyGroupScan [selectionRoot=/test/t.json, numFiles=1, columns=[`a1`, `b1`, `c1`], files=[maprfs:/test/t.json]]])
00-04 Project(calc1=[+($2, 100)], a1=[$2], b1=[$1], c1=[$0])
00-06 Scan(groupscan=[EasyGroupScan [selectionRoot=/test/t.json, numFiles=1, columns=[`a1`, `b1`, `c1`], files=[maprfs:/test/t.json]]])
Correct result, when order of the columns is different in aggregate functions: "min(b1), max(a1)" instead of "max(a1), min(b1)"
0: jdbc:drill:schema=dfs> select calc1, min(b1), max(a1), count(c1) from (select a1+10 as calc1, a1, b1, c1 from `t.json` union all select a1+100 as calc1, a1, b1, c1 from `t.json`) group by calc1;
+------------+------------+------------+------------+
| calc1 | EXPR$1 | EXPR$2 | EXPR$3 |
+------------+------------+------------+------------+
| 10 | 0 | 0 | 3 |
| 11 | 1 | 1 | 2 |
| 100 | 0 | 0 | 3 |
| 101 | 1 | 1 | 2 |
+------------+------------+------------+------------+
4 rows selected (0.256 seconds)
00-01 Project(calc1=[$0], EXPR$1=[$1], EXPR$2=[$2], EXPR$3=[$3])
00-02 HashAgg(group=[{0}], EXPR$1=[MIN($1)], EXPR$2=[MAX($2)], EXPR$3=[COUNT($3)])
00-03 Project(calc1=[$0], b1=[$2], a1=[$1], c1=[$3])
00-04 UnionAll(all=[true])
00-06 Project(calc1=[+($2, 10)], a1=[$2], b1=[$1], c1=[$0])
00-08 Scan(groupscan=[EasyGroupScan [selectionRoot=/test/t.json, numFiles=1, columns=[`a1`, `b1`, `c1`], files=[maprfs:/test/t.json]]])
00-05 Project(calc1=[+($2, 100)], a1=[$2], b1=[$1], c1=[$0])
00-07 Scan(groupscan=[EasyGroupScan [selectionRoot=/test/t.json, numFiles=1, columns=[`a1`, `b1`, `c1`], files=[maprfs:/test/t.json]]])
Non-union query works correctly:
0: jdbc:drill:schema=dfs> select a1+10 as calc1, min(a1), max(b1), count(c1) from `t.json` group by a1+10; +------------+------------+------------+------------+ | calc1 | EXPR$1 | EXPR$2 | EXPR$3 | +------------+------------+------------+------------+ | 10 | 0 | 0 | 3 | | 11 | 1 | 1 | 2 | +------------+------------+------------+------------+ 2 rows selected (0.142 seconds) 0: jdbc:drill:schema=dfs> select a1+10 as calc1, min(b1), max(a1), count(c1) from `t.json` group by a1+10; +------------+------------+------------+------------+ | calc1 | EXPR$1 | EXPR$2 | EXPR$3 | +------------+------------+------------+------------+ | 10 | 0 | 0 | 3 | | 11 | 1 | 1 | 2 | +------------+------------+------------+------------+ 2 rows selected (0.153 seconds)
If I switch legs of a UNION, we still get wrong result, but from the right side:
0: jdbc:drill:schema=dfs> select calc1, max(a1), min(b1), count(c1) from (select a1+100 as calc1, a1, b1, c1 from `t.json` union all select a1+10 as calc1, a1, b1, c1 from `t.json`) group by calc1; +------------+------------+------------+------------+ | calc1 | EXPR$1 | EXPR$2 | EXPR$3 | +------------+------------+------------+------------+ | 10 | 0 | 0 | 3 | | 11 | 1 | 1 | 2 | +------------+------------+------------+------------+ 2 rows selected (0.19 seconds)
Attachments
Issue Links
- depends upon
-
DRILL-2207 Fix Union All Operator
- Resolved
- is related to
-
DRILL-2376 UNION ALL on Aggregates with GROUP BY returns incomplete results
- Closed