Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
explain select count(*) from LU_CUSTOMER pa11 join ORDER_FACT a15 on (pa11.CUSTOMER_ID = a15.CUSTOMER_ID) join LU_CUSTOMER a16 on (a15.CUSTOMER_ID = a16.CUSTOMER_ID and pa11.CUSTOMER_ID = a16.CUSTOMER_ID);
a16.CUSTOMER_ID is referenced more than once in the join condition.
Hive generates Reduce sink operators for the join's children and one of the RS row schema contains only one instance of the join keys (customer_id).
RS[13] result = {HashMap@16092} size = 2 "KEY.reducesinkkey0" -> {ExprNodeColumnDesc@16083} "Column[_col0]" "KEY.reducesinkkey1" -> {ExprNodeColumnDesc@16102} "Column[_col0]" result = {RowSchema@16104} "(KEY.reducesinkkey0: int|{$hdt$_2}customer_id)" signature = {ArrayList@16110} size = 1 0 = {ColumnInfo@16087} "KEY.reducesinkkey0: int"
KEY.reducesinkkey1 is missing from the schema.
When converting the join to mapjoin the converter algorithm fails looking up both join key column instances.
Attachments
Issue Links
- links to