Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-3846

EnumerableMergeJoin: wrong comparison of composite key with null values

    XMLWordPrintableJSON

Details

    Description

      The problem can be reproduced with the following test in EnumerablesJoinTest.java:

        @Test public void testMergeJoinWithCompositeKeyAndNull() {
          tester(false, new JdbcTest.HrSchema())
              .query("?")
              .withHook(Hook.PLANNER, (Consumer<RelOptPlanner>) planner -> {
                planner.addRule(EnumerableRules.ENUMERABLE_MERGE_JOIN_RULE);
                planner.removeRule(EnumerableRules.ENUMERABLE_JOIN_RULE);
              })
              .withRel(builder -> builder
                  .scan("s", "emps")
                  .sort(builder.field("deptno"), builder.field("commission"))
                  .scan("s", "emps")
                  .sort(builder.field("deptno"), builder.field("commission"))
                  .join(JoinRelType.INNER,
                      builder.and(
                          builder.equals(
                              builder.field(2, 0, "deptno"),
                              builder.field(2, 1, "deptno")),
                          builder.equals(
                              builder.field(2, 0, "commission"),
                              builder.field(2, 1, "commission"))))
                  .project(
                      builder.field("empid"))
                  .build())
              .explainHookMatches("" // It is important that we have MergeJoin in the plan
                  + "EnumerableCalc(expr#0..4=[{inputs}], empid=[$t0])\n"
                  + "  EnumerableMergeJoin(condition=[AND(=($1, $3), =($2, $4))], joinType=[inner])\n"
                  + "    EnumerableSort(sort0=[$1], sort1=[$2], dir0=[ASC], dir1=[ASC])\n"
                  + "      EnumerableCalc(expr#0..4=[{inputs}], proj#0..1=[{exprs}], commission=[$t4])\n"
                  + "        EnumerableTableScan(table=[[s, emps]])\n"
                  + "    EnumerableSort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC])\n"
                  + "      EnumerableCalc(expr#0..4=[{inputs}], deptno=[$t1], commission=[$t4])\n"
                  + "        EnumerableTableScan(table=[[s, emps]])\n")
              .returnsUnordered("empid=100\nempid=110\nempid=150\nempid=200");
        }
      

      The test fails with the following exception:

      java.lang.IllegalStateException: mergeJoin assumes inputs sorted in ascending order, however [10, 1000] is greater than [10, null]
      

      The problem is that EnumerableMergeJoin implementation (i.e. EnumerableDefaults#mergeJoin) expects its inputs to be sorted in ascending order, nulls last [1] (see EnumerableMergeJoinRule [2]). In case of a composite key, EnumerableMergeJoin will represent keys as JavaRowFormat.LIST [3], which is a comparable list, whose comparison is implemented via FlatLists.ComparableListImpl#compare. This method will compare both lists, item by item, but it will consider that a null item is less-than a non-null item [4]. This is a de-facto nulls-first collation, which contradicts the pre-requisite of the mergeJoin algorithm.

      [1] https://github.com/apache/calcite/blob/200a136dd3b2ca04a1d1283eb5a03a04388b1947/linq4j/src/main/java/org/apache/calcite/linq4j/EnumerableDefaults.java#L1966
      [2] https://github.com/apache/calcite/blob/200a136dd3b2ca04a1d1283eb5a03a04388b1947/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableMergeJoinRule.java#L72
      [3] https://github.com/apache/calcite/blob/200a136dd3b2ca04a1d1283eb5a03a04388b1947/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableMergeJoin.java#L154
      [4] https://github.com/apache/calcite/blob/200a136dd3b2ca04a1d1283eb5a03a04388b1947/core/src/main/java/org/apache/calcite/runtime/FlatLists.java#L1350

      Attachments

        Issue Links

          Activity

            People

              rubenql Ruben Q L
              rubenql Ruben Q L
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 2.5h
                  2.5h