Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-1887

Detect transitive join conditions via expressions

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.13.0
    • None
    • core
    • None

    Description

      Given table aliases ta, tb column names ca, cb, and an arbitrary (deterministic) expression expr then calcite should be capable to infer join conditions by transitivity:

      ta.ca = expr AND tb.cb = expr -> ta.ca = tb.cb
      

      The use case for us stems from SPARQL to SQL rewriting, where SPARQL queries such as

      SELECT {
        dbr:Leipzig a ?type .
        dbr:Leipzig dbo:mayor ?mayor
      }
      

      result in an SQL query similar to

      SELECT s.rdf a, s.rdf b WHERE a.s = 'dbr:Leipzig' AND b.s = 'dbr:Leipzig'
      

      A consequence of the join condition not being recognized is, that Apache Flink does not find an executable plan to process the query.

      Self contained example:

      package my.package;
      
      import org.apache.calcite.adapter.java.ReflectiveSchema;
      import org.apache.calcite.plan.RelOptUtil;
      import org.apache.calcite.rel.RelNode;
      import org.apache.calcite.rel.RelRoot;
      import org.apache.calcite.schema.SchemaPlus;
      import org.apache.calcite.sql.SqlNode;
      import org.apache.calcite.sql.parser.SqlParser;
      import org.apache.calcite.tools.FrameworkConfig;
      import org.apache.calcite.tools.Frameworks;
      import org.apache.calcite.tools.Planner;
      import org.junit.Test;
      
      
      public class TestCalciteJoin {
          public static class Triple {
              public String s;
              public String p;
              public String o;
      
              public Triple(String s, String p, String o) {
                  super();
                  this.s = s;
                  this.p = p;
                  this.o = o;
              }
      
          }
      
          public static class TestSchema {
              public final Triple[] rdf = {new Triple("s", "p", "o")};
          }
      
      
          @Test
          public void testCalciteJoin() throws Exception {
              SchemaPlus rootSchema = Frameworks.createRootSchema(true);
      
              rootSchema.add("s", new ReflectiveSchema(new TestSchema()));
      
              Frameworks.ConfigBuilder configBuilder = Frameworks.newConfigBuilder();
              configBuilder.defaultSchema(rootSchema);
              FrameworkConfig frameworkConfig = configBuilder.build();
      
              SqlParser.ConfigBuilder parserConfig = SqlParser.configBuilder(frameworkConfig.getParserConfig());
              parserConfig
                  .setCaseSensitive(false)
                  .setConfig(parserConfig.build());
      
              Planner planner = Frameworks.getPlanner(frameworkConfig);
      
              // SELECT s.rdf a, s.rdf b WHERE a.s = 5 AND b.s = 5
              SqlNode sqlNode = planner.parse("SELECT * FROM \"s\".\"rdf\" \"a\", \"s\".\"rdf\" \"b\" WHERE \"a\".\"s\" = 5 AND \"b\".\"s\" = 5");
              planner.validate(sqlNode);
              RelRoot relRoot = planner.rel(sqlNode);
              RelNode relNode = relRoot.project();
              System.out.println(RelOptUtil.toString(relNode));
          }
      }
      

      Actual plan:

      LogicalProject(s=[$0], p=[$1], o=[$2], s0=[$3], p0=[$4], o0=[$5])
        LogicalFilter(condition=[AND(=($0, 5), =($3, 5))])
          LogicalJoin(condition=[true], joinType=[inner])
            EnumerableTableScan(table=[[s, rdf]])
            EnumerableTableScan(table=[[s, rdf]])
      

      Expected Plan fragment:

          LogicalJoin(condition=[=($0, $3)], joinType=[inner])
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            Aklakan Claus Stadler
            Votes:
            4 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated: