Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
Description
create table web_sales (ws_order_number int, ws_warehouse_sk int) stored as orc; insert into web_sales values (1, 1), (1, 2), (2, 1), (2, 2); select * from web_sales ws1 where exists (select 1 from web_sales ws2 where ws1.ws_order_number = ws2.ws_order_number limit 1); 1 1 1 2
CBO PLAN: HiveSemiJoin(condition=[=($0, $2)], joinType=[semi]) HiveProject(ws_order_number=[$0], ws_warehouse_sk=[$1]) HiveFilter(condition=[IS NOT NULL($0)]) HiveTableScan(table=[[default, web_sales]], table:alias=[ws1]) HiveProject(ws_order_number=[$0]) HiveSortLimit(fetch=[1]) <-- This shouldn't be added HiveProject(ws_order_number=[$0]) HiveFilter(condition=[IS NOT NULL($0)]) HiveTableScan(table=[[default, web_sales]], table:alias=[ws2])
Limit n on the right side of the join reduces the result set coming from the right to only n record hence not all the ws_order_number values are included which leads to correctness issue.
Attachments
Issue Links
- links to