Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
1.22.0
Description
The problem can be reproduced with the following test in EnumerablesJoinTest.java:
@Test public void testMergeJoinWithCompositeKeyAndNull() { tester(false, new JdbcTest.HrSchema()) .query("?") .withHook(Hook.PLANNER, (Consumer<RelOptPlanner>) planner -> { planner.addRule(EnumerableRules.ENUMERABLE_MERGE_JOIN_RULE); planner.removeRule(EnumerableRules.ENUMERABLE_JOIN_RULE); }) .withRel(builder -> builder .scan("s", "emps") .sort(builder.field("deptno"), builder.field("commission")) .scan("s", "emps") .sort(builder.field("deptno"), builder.field("commission")) .join(JoinRelType.INNER, builder.and( builder.equals( builder.field(2, 0, "deptno"), builder.field(2, 1, "deptno")), builder.equals( builder.field(2, 0, "commission"), builder.field(2, 1, "commission")))) .project( builder.field("empid")) .build()) .explainHookMatches("" // It is important that we have MergeJoin in the plan + "EnumerableCalc(expr#0..4=[{inputs}], empid=[$t0])\n" + " EnumerableMergeJoin(condition=[AND(=($1, $3), =($2, $4))], joinType=[inner])\n" + " EnumerableSort(sort0=[$1], sort1=[$2], dir0=[ASC], dir1=[ASC])\n" + " EnumerableCalc(expr#0..4=[{inputs}], proj#0..1=[{exprs}], commission=[$t4])\n" + " EnumerableTableScan(table=[[s, emps]])\n" + " EnumerableSort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC])\n" + " EnumerableCalc(expr#0..4=[{inputs}], deptno=[$t1], commission=[$t4])\n" + " EnumerableTableScan(table=[[s, emps]])\n") .returnsUnordered("empid=100\nempid=110\nempid=150\nempid=200"); }
The test fails with the following exception:
java.lang.IllegalStateException: mergeJoin assumes inputs sorted in ascending order, however [10, 1000] is greater than [10, null]
The problem is that EnumerableMergeJoin implementation (i.e. EnumerableDefaults#mergeJoin) expects its inputs to be sorted in ascending order, nulls last [1] (see EnumerableMergeJoinRule [2]). In case of a composite key, EnumerableMergeJoin will represent keys as JavaRowFormat.LIST [3], which is a comparable list, whose comparison is implemented via FlatLists.ComparableListImpl#compare. This method will compare both lists, item by item, but it will consider that a null item is less-than a non-null item [4]. This is a de-facto nulls-first collation, which contradicts the pre-requisite of the mergeJoin algorithm.
[1] https://github.com/apache/calcite/blob/200a136dd3b2ca04a1d1283eb5a03a04388b1947/linq4j/src/main/java/org/apache/calcite/linq4j/EnumerableDefaults.java#L1966
[2] https://github.com/apache/calcite/blob/200a136dd3b2ca04a1d1283eb5a03a04388b1947/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableMergeJoinRule.java#L72
[3] https://github.com/apache/calcite/blob/200a136dd3b2ca04a1d1283eb5a03a04388b1947/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableMergeJoin.java#L154
[4] https://github.com/apache/calcite/blob/200a136dd3b2ca04a1d1283eb5a03a04388b1947/core/src/main/java/org/apache/calcite/runtime/FlatLists.java#L1350
Attachments
Issue Links
- relates to
-
CALCITE-5732 EnumerableHashJoin and EnumerableMergeJoin on composite key return rows matching condition 'null = null'
- Closed
- links to