Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-2712

Add rule to remove null-generating side of a Join

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.20.0
    • Component/s: None

      Description

      Add a rule to remove the null-generating side of a Join. Here is an example where we eliminate the "many" side of a "one-to-many" join:

      # Example 1: one-to-many
      SELECT c.id, COUNT(DISTINCT o.productId)
      FROM Customers AS c
      LEFT JOIN SupportCalls AS s ON c.id = s.customerId
      LEFT JOIN Orders AS o ON c.id = o.customerId

      We can remove SupportCalls and the join to it, so the query becomes

      SELECT c.id, COUNT(DISTINCT o.productId)
      FROM Customers AS c
      LEFT JOIN Orders AS o ON c.id = o.customerId

      Necessary conditions are:

      1. no columns from SupportCalls are used
      2. the join is LEFT, so customers will not be eliminated if they have no support calls,
      3. there is an Aggregate on top, so we don't care if there are >1 support call per customer.

      A simpler example of one-to-many:

      # Example 2: simpler one-to-many
      SELECT DISTINCT c.id
      FROM Customers AS c
      LEFT JOIN SupportCalls AS s ON c.id = s.customerId

      An example of many-to-one, where we eliminate the "one" side (Orders):

      # Example 3: many-to-one
      SELECT c.id, p.color
      FROM LineItems AS i
      LEFT JOIN Orders AS o ON o.id = i.orderId
      LEFT JOIN Products AS p ON p.id = i.orderId

      so that the query becomes

      SELECT c.id, p.color
      FROM LineItems AS i
      LEFT JOIN Products AS p ON p.id = i.orderId

      Here, necessary side-conditions are:

      1. no columns from Orders are used;
      2. unique key on Orders.id.

      We do not require aggregation, because the primary key on Orders.id ensures that Orders contributes at most one row.

      We can deal with similar cases like

      # Example 4: many-to-one, column aliasing required
      SELECT c.id, p.color
      FROM LineItems AS i
      LEFT JOIN Orders AS o ON o.id = i.orderId
      LEFT JOIN Products AS p ON p.id = o.id

      if we use aliasing (o.id = i.orderId) and a foreign key that ensures the existence of an record in Orders.

      For examples 1 and 2 (one-to-many), we would need to match Aggregate on Join, therefore make a variant of AggregateJoinTransposeRule.

      For examples 3 and 4 (many-to-one or one-to-one)), we would match Project on Join, therefore make a variant of ProjectJoinTransposeRule.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Chunwei Lei Chunwei Lei
                Reporter:
                julianhyde Julian Hyde
              • Votes:
                0 Vote for this issue
                Watchers:
                7 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 4h
                  4h