Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
Description
Add a WITHIN DISTINCT clause to aggregate functions, allowing duplicate rows to be eliminated before entering the function.
This feature is non-standard, and in fact does not exist in any database I am aware of. Therefore this feature is experimental, and may evolve or be removed without notice.
It is related to DISTINCT, and is in fact a generalization of it. DISTINCT can always be rewritten in terms of WITHIN DISTINCT. For example, SUM(DISTINCT sal) is equivalent to SUM(sal) WITHIN DISTINCT (sal).
Consider the query
SELECT SUM(age), SUM(DISTINCT age), SUM(age) WITHIN DISTINCT (name) FROM Friends
where Friends has the rows
name age job ====== === ========== Julian 16 Programmer Dick 15 Anne 13 Car wash George 15 Lifeguard George 15 Dog walker Timmy 4
Note that there are two rows for George, because she has two jobs.
The values of the columns are as follows:
- SUM(age) has the value (16 + 15 + 13 + 15 + 15 + 4) = 78;
- SUM(DISTINCT age) has the value (16 + 15 + 13 + 4) = 48;
- SUM(age) WITHIN DISTINCT (name) has the value (16 + 15 + 13 + 15 + 4) = 63.
WITHIN DISTINCT has treated the two 15 values for George as one value, but has still counted the 15 for Dick separately.
The WITHIN DISTINCT clause can be useful to prevent double-counting when duplicates have been added via a many-to-one join.
Attachments
Issue Links
- depends upon
-
CALCITE-4484 Add UNIQUE_VALUE(x) aggregate function, that throws if x is not unique
- Open
- relates to
-
CALCITE-4496 Measure columns ("SELECT ... AS MEASURE")
- Closed
-
CALCITE-6217 PER clause for aggregate functions
- Open
- links to