Add a WITHIN DISTINCT clause to aggregate functions, allowing duplicate rows to be eliminated before entering the function.
This feature is non-standard, and in fact does not exist in any database I am aware of. Therefore this feature is experimental, and may evolve or be removed without notice.
It is related to DISTINCT, and is in fact a generalization of it. DISTINCT can always be rewritten in terms of WITHIN DISTINCT. For example, SUM(DISTINCT sal) is equivalent to SUM(sal) WITHIN DISTINCT (sal).
Consider the query
where Friends has the rows
Note that there are two rows for George, because she has two jobs.
The values of the columns are as follows:
- SUM(age) has the value (16 + 15 + 13 + 15 + 15 + 4) = 78;
- SUM(DISTINCT age) has the value (16 + 15 + 13 + 4) = 48;
- SUM(age) WITHIN DISTINCT (name) has the value (16 + 15 + 13 + 15 + 4) = 63.
WITHIN DISTINCT has treated the two 15 values for George as one value, but has still counted the 15 for Dick separately.
The WITHIN DISTINCT clause can be useful to prevent double-counting when duplicates have been added via a many-to-one join.