Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-3846

EnumerableMergeJoin: wrong comparison of composite key with null values

VotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

      Description

      The problem can be reproduced with the following test in EnumerablesJoinTest.java:

        @Test public void testMergeJoinWithCompositeKeyAndNull() {
          tester(false, new JdbcTest.HrSchema())
              .query("?")
              .withHook(Hook.PLANNER, (Consumer<RelOptPlanner>) planner -> {
                planner.addRule(EnumerableRules.ENUMERABLE_MERGE_JOIN_RULE);
                planner.removeRule(EnumerableRules.ENUMERABLE_JOIN_RULE);
              })
              .withRel(builder -> builder
                  .scan("s", "emps")
                  .sort(builder.field("deptno"), builder.field("commission"))
                  .scan("s", "emps")
                  .sort(builder.field("deptno"), builder.field("commission"))
                  .join(JoinRelType.INNER,
                      builder.and(
                          builder.equals(
                              builder.field(2, 0, "deptno"),
                              builder.field(2, 1, "deptno")),
                          builder.equals(
                              builder.field(2, 0, "commission"),
                              builder.field(2, 1, "commission"))))
                  .project(
                      builder.field("empid"))
                  .build())
              .explainHookMatches("" // It is important that we have MergeJoin in the plan
                  + "EnumerableCalc(expr#0..4=[{inputs}], empid=[$t0])\n"
                  + "  EnumerableMergeJoin(condition=[AND(=($1, $3), =($2, $4))], joinType=[inner])\n"
                  + "    EnumerableSort(sort0=[$1], sort1=[$2], dir0=[ASC], dir1=[ASC])\n"
                  + "      EnumerableCalc(expr#0..4=[{inputs}], proj#0..1=[{exprs}], commission=[$t4])\n"
                  + "        EnumerableTableScan(table=[[s, emps]])\n"
                  + "    EnumerableSort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC])\n"
                  + "      EnumerableCalc(expr#0..4=[{inputs}], deptno=[$t1], commission=[$t4])\n"
                  + "        EnumerableTableScan(table=[[s, emps]])\n")
              .returnsUnordered("empid=100\nempid=110\nempid=150\nempid=200");
        }
      

      The test fails with the following exception:

      java.lang.IllegalStateException: mergeJoin assumes inputs sorted in ascending order, however [10, 1000] is greater than [10, null]
      

      The problem is that EnumerableMergeJoin implementation (i.e. EnumerableDefaults#mergeJoin) expects its inputs to be sorted in ascending order, nulls last [1] (see EnumerableMergeJoinRule [2]). In case of a composite key, EnumerableMergeJoin will represent keys as JavaRowFormat.LIST [3], which is a comparable list, whose comparison is implemented via FlatLists.ComparableListImpl#compare. This method will compare both lists, item by item, but it will consider that a null item is less-than a non-null item [4]. This is a de-facto nulls-first collation, which contradicts the pre-requisite of the mergeJoin algorithm.

      [1] https://github.com/apache/calcite/blob/200a136dd3b2ca04a1d1283eb5a03a04388b1947/linq4j/src/main/java/org/apache/calcite/linq4j/EnumerableDefaults.java#L1966
      [2] https://github.com/apache/calcite/blob/200a136dd3b2ca04a1d1283eb5a03a04388b1947/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableMergeJoinRule.java#L72
      [3] https://github.com/apache/calcite/blob/200a136dd3b2ca04a1d1283eb5a03a04388b1947/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableMergeJoin.java#L154
      [4] https://github.com/apache/calcite/blob/200a136dd3b2ca04a1d1283eb5a03a04388b1947/core/src/main/java/org/apache/calcite/runtime/FlatLists.java#L1350

        Attachments

          Activity

            People

            • Assignee:
              rubenql Ruben Q L
              Reporter:
              rubenql Ruben Q L

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 2.5h
                2.5h

                  Issue deployment