Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-12728

[C++][Compute] Implement count_distinct/distinct hash aggregate kernels

    XMLWordPrintableJSON

Details

    Description

      Implement count distinct aggregate reusing hash table from hash group by inside of it.

      This brings support to SQL queries like:
      select a, count(distinct b), count(distinct c) from t group by a

      For instance to compute count(distinct b), the first group id mapping will give group id based on column a value; then the second group id mapping is done using the key (groupid(a), b) inside count(distinct b) aggregate (similarly for count(distinct c)). 
      After all input rows are consumed, the final processing step scans the hash tables based on (groupid(a), b) and updates an array of counts indexed by groupid(a).
      The resulting array of counts represents the output of count distinct aggregate.

      Attachments

        Issue Links

          Activity

            People

              lidavidm David Li
              michalno Michal Nowakiewicz
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 2h 50m
                  2h 50m