Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-2712

Add rule to remove null-generating side of a Join

VotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 1.20.0
    • None

    Description

      Add a rule to remove the null-generating side of a Join. Here is an example where we eliminate the "many" side of a "one-to-many" join:

      # Example 1: one-to-many
      SELECT c.id, COUNT(DISTINCT o.productId)
      FROM Customers AS c
      LEFT JOIN SupportCalls AS s ON c.id = s.customerId
      LEFT JOIN Orders AS o ON c.id = o.customerId

      We can remove SupportCalls and the join to it, so the query becomes

      SELECT c.id, COUNT(DISTINCT o.productId)
      FROM Customers AS c
      LEFT JOIN Orders AS o ON c.id = o.customerId

      Necessary conditions are:

      1. no columns from SupportCalls are used
      2. the join is LEFT, so customers will not be eliminated if they have no support calls,
      3. there is an Aggregate on top, so we don't care if there are >1 support call per customer.

      A simpler example of one-to-many:

      # Example 2: simpler one-to-many
      SELECT DISTINCT c.id
      FROM Customers AS c
      LEFT JOIN SupportCalls AS s ON c.id = s.customerId

      An example of many-to-one, where we eliminate the "one" side (Orders):

      # Example 3: many-to-one
      SELECT c.id, p.color
      FROM LineItems AS i
      LEFT JOIN Orders AS o ON o.id = i.orderId
      LEFT JOIN Products AS p ON p.id = i.orderId

      so that the query becomes

      SELECT c.id, p.color
      FROM LineItems AS i
      LEFT JOIN Products AS p ON p.id = i.orderId

      Here, necessary side-conditions are:

      1. no columns from Orders are used;
      2. unique key on Orders.id.

      We do not require aggregation, because the primary key on Orders.id ensures that Orders contributes at most one row.

      We can deal with similar cases like

      # Example 4: many-to-one, column aliasing required
      SELECT c.id, p.color
      FROM LineItems AS i
      LEFT JOIN Orders AS o ON o.id = i.orderId
      LEFT JOIN Products AS p ON p.id = o.id

      if we use aliasing (o.id = i.orderId) and a foreign key that ensures the existence of an record in Orders.

      For examples 1 and 2 (one-to-many), we would need to match Aggregate on Join, therefore make a variant of AggregateJoinTransposeRule.

      For examples 3 and 4 (many-to-one or one-to-one)), we would match Project on Join, therefore make a variant of ProjectJoinTransposeRule.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Chunwei Lei Chunwei Lei
            julianhyde Julian Hyde
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 4h
                4h

                Slack

                  Issue deployment