Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-4483

WITHIN DISTINCT clause for aggregate functions (experimental)

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 1.27.0
    • None

    Description

      Add a WITHIN DISTINCT clause to aggregate functions, allowing duplicate rows to be eliminated before entering the function.

      This feature is non-standard, and in fact does not exist in any database I am aware of. Therefore this feature is experimental, and may evolve or be removed without notice.

      It is related to DISTINCT, and is in fact a generalization of it. DISTINCT can always be rewritten in terms of WITHIN DISTINCT. For example, SUM(DISTINCT sal) is equivalent to SUM(sal) WITHIN DISTINCT (sal).

      Consider the query

      SELECT SUM(age),
         SUM(DISTINCT age),
        SUM(age) WITHIN DISTINCT (name)
      FROM Friends

      where Friends has the rows

      name   age job
      ====== === ==========
      Julian  16 Programmer
      Dick    15
      Anne    13 Car wash
      George  15 Lifeguard
      George  15 Dog walker
      Timmy    4
      

      Note that there are two rows for George, because she has two jobs.

      The values of the columns are as follows:

      • SUM(age) has the value (16 + 15 + 13 + 15 + 15 + 4) = 78;
      • SUM(DISTINCT age) has the value (16 + 15 + 13 + 4) = 48;
      • SUM(age) WITHIN DISTINCT (name) has the value (16 + 15 + 13 + 15 + 4) = 63.

      WITHIN DISTINCT has treated the two 15 values for George as one value, but has still counted the 15 for Dick separately.

      The WITHIN DISTINCT clause can be useful to prevent double-counting when duplicates have been added via a many-to-one join.

      Attachments

        Issue Links

          Activity

            People

              julianhyde Julian Hyde
              julianhyde Julian Hyde
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m