Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-4124

[C++] Abstract aggregation kernel API

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.13.0
    • C++

    Description

      Related to the particular details of implementing various aggregation types, we should first put a bit of energy into the abstract API for aggregating data in a multi-threaded setting

      Aggregators must support both hash/group (e.g. "group by" in SQL or data frame libraries) modes and non-group modes.

      Aggregations ideally should also support filter pushdown. For example:

      select $AGG($EXPR)
      from $TABLE
      where $PREDICATE
      

      Some systems might materialize the post-predicate / filtered version of $EXPR, then aggregate that. pandas does this for example. Vectorized performance can be much improved by filtering inside the aggregation kernel. How the predicate true/false values are handled may depend on the implementation details of the kernel (e.g. SUM or MEAN will be a bit different from PRODUCT)

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            fsaintjacques Francois Saint-Jacques
            wesm Wes McKinney
            Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 11h 50m
                11h 50m

                Slack

                  Issue deployment