Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-838

Optimize "join to most recent" queries

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      A common class of queries joins a fact table to an "current" record in a dimension table. What appears to be a many-to-many join becomes many-to-one because there is only one current record.

      Here the query on orders (fact) and customers (dimension):

      SELECT *
      FROM Orders
      JOIN Customers AS c USING (customerId)
      WHERE NOT EXISTS (
        SELECT 1
        FROM Customers
        WHERE customerId = c.CustomerId 
        AND effectiveDate > c.effectiveDate)

      Same query pushing WHERE into JOIN:

      SELECT *
      FROM Orders AS o
      JOIN Customers AS c
      ON o.customerId = c.customerId
      AND NOT EXISTS (
        SELECT 1
        FROM Customers
        WHERE customerId = c.CustomerId 
        AND effectiveDate > c.effectiveDate)

      The same query re-formulated using LAST_VALUE:

      SELECT *
      FROM Orders AS o
      JOIN (
        SELECT last_value(customerId) OVER w AS customerId,
          last_value(customerName) OVER w AS customerName
        FROM Customers AS c
        WINDOW w AS (PARTITION BY customerId
          ORDER BY effectiveDate))

      This formulation avoids the self-join, and is uncorrelated. Both good things.

      It could probably also be written using correlated NOT IN. (Not good, because any plan chosen for NOT IN has to watch out for null values.)

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                julianhyde Julian Hyde
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated: