Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-13993

[C++] Hash aggregate function that returns value from first row in group

    XMLWordPrintableJSON

Details

    Description

      It would be nice to have a hash aggregate function that returns the first value of a column within each hash group.

      If row order within groups is non-deterministic, then effectively this would return one arbitrary value. This is a very computationally cheap operation.

      This can be quite useful when querying a non-normalized table. For example if you have a table with a country column and also a country_abbr column and you want to group by either/both of those columns but return the values from both columns, you could do

      SELECT country, country_abbr FROM table GROUP BY country, country_abbr

      but it would be more efficient to do

      SELECT country, first(country_abbr) FROM table GROUP BY country

      because then the engine does not need to scan all the values of the country_abbr column.

      Attachments

        Issue Links

          Activity

            People

              dhruv9vats Dhruv Vats
              icook Ian Cook
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 5h 50m
                  5h 50m