Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
A common class of queries joins a fact table to an "current" record in a dimension table. What appears to be a many-to-many join becomes many-to-one because there is only one current record.
Here the query on orders (fact) and customers (dimension):
SELECT * FROM Orders JOIN Customers AS c USING (customerId) WHERE NOT EXISTS ( SELECT 1 FROM Customers WHERE customerId = c.CustomerId AND effectiveDate > c.effectiveDate)
Same query pushing WHERE into JOIN:
SELECT * FROM Orders AS o JOIN Customers AS c ON o.customerId = c.customerId AND NOT EXISTS ( SELECT 1 FROM Customers WHERE customerId = c.CustomerId AND effectiveDate > c.effectiveDate)
The same query re-formulated using LAST_VALUE:
SELECT * FROM Orders AS o JOIN ( SELECT last_value(customerId) OVER w AS customerId, last_value(customerName) OVER w AS customerName FROM Customers AS c WINDOW w AS (PARTITION BY customerId ORDER BY effectiveDate))
This formulation avoids the self-join, and is uncorrelated. Both good things.
It could probably also be written using correlated NOT IN. (Not good, because any plan chosen for NOT IN has to watch out for null values.)
Attachments
Issue Links
- is related to
-
CALCITE-816 Represent sub-query as a RexNode
- Closed