Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
None
-
None
Description
The following query with a right outer join produces correct results on the first execution in a session but wrong results on the second and subsequent executions. A potential cause for the problem can be seen from the two Explain plans: the scan of the nation table shows a difference in the columns being projected.
0: jdbc:drill:zk=local> select n.n_regionkey, r.r_regionkey from cp.`tpch/region.parquet` r right join cp.`tpch/nation.parquet` n on n.n_regionkey = r.r_regionkey;
------------------------+
n_regionkey | r_regionkey |
------------------------+
0 | 0 |
0 | 0 |
0 | 0 |
0 | 0 |
0 | 0 |
1 | 1 |
1 | 1 |
1 | 1 |
1 | 1 |
1 | 1 |
2 | 2 |
2 | 2 |
2 | 2 |
2 | 2 |
2 | 2 |
3 | 3 |
3 | 3 |
3 | 3 |
3 | 3 |
3 | 3 |
4 | 4 |
4 | 4 |
4 | 4 |
4 | 4 |
4 | 4 |
------------------------+
25 rows selected (2.207 seconds)
0: jdbc:drill:zk=local> select n.n_regionkey, r.r_regionkey from cp.`tpch/region.parquet` r right join cp.`tpch/nation.parquet` n on n.n_regionkey = r.r_regionkey;
------------------------+
n_regionkey | r_regionkey |
------------------------+
0 | null |
1 | null |
1 | null |
1 | null |
4 | null |
0 | null |
3 | null |
3 | null |
2 | null |
2 | null |
4 | null |
4 | null |
2 | null |
4 | null |
0 | null |
0 | null |
0 | null |
1 | null |
2 | null |
3 | null |
4 | null |
2 | null |
3 | null |
3 | null |
1 | null |
------------------------+
25 rows selected (0.514 seconds)
EXPLAIN plan for the good run:
00-00 Screen 00-01 Project(n_regionkey=[$0], r_regionkey=[$1]) 00-02 Project(n_regionkey=[$3], r_regionkey=[$1]) 00-03 HashJoin(condition=[=($3, $1)], joinType=[right]) 00-05 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=/tpch/region.parquet]], selectionRoot=/tpch/region.parquet, columns=[SchemaPath [`r_regionkey`]]]]) 00-04 Project(*0=[$0], n_regionkey=[$1]) 00-06 BroadcastExchange 01-01 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=/tpch/nation.parquet]], selectionRoot=/tpch/nation.parquet, columns=[SchemaPath [`n_regionkey`]]]]) |
Explain plan for the bad run:
00-00 Screen 00-01 Project(n_regionkey=[$0], r_regionkey=[$1]) 00-02 Project(n_regionkey=[$3], r_regionkey=[$1]) 00-03 HashJoin(condition=[=($2, $1)], joinType=[right]) 00-05 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=/tpch/region.parquet]], selectionRoot=/tpch/region.parquet, columns=[SchemaPath [`r_regionkey`]]]]) 00-04 Project(*0=[$0], n_regionkey=[$1]) 00-06 BroadcastExchange 01-01 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=/tpch/nation.parquet]], selectionRoot=/tpch/nation.parquet, columns=null]]) |