[ARROW-4124] [C++] Abstract aggregation kernel API - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.13.0
Component/s: C++
Labels:
- pull-request-available

External issue URL:
https://github.com/apache/arrow/issues/20713

Description

Related to the particular details of implementing various aggregation types, we should first put a bit of energy into the abstract API for aggregating data in a multi-threaded setting

Aggregators must support both hash/group (e.g. "group by" in SQL or data frame libraries) modes and non-group modes.

Aggregations ideally should also support filter pushdown. For example:

select $AGG($EXPR)
from $TABLE
where $PREDICATE

Some systems might materialize the post-predicate / filtered version of $EXPR, then aggregate that. pandas does this for example. Vectorized performance can be much improved by filtering inside the aggregation kernel. How the predicate true/false values are handled may depend on the implementation details of the kernel (e.g. SUM or MEAN will be a bit different from PRODUCT)

Attachments

Issue Links

relates to

ARROW-3120 [C++] Parallelize execution of ScalarAggregateFunction

Open

ARROW-3121 [C++] Mean kernel aggregate

Resolved

ARROW-3123 [C++] Incremental Count, Count Not Null aggregator

Resolved

ARROW-3122 [C++] Incremental Variance, Standard Deviation aggregators

Closed

links to

GitHub Pull Request #3407

Activity

People

Assignee:: Francois Saint-Jacques

Reporter:: Wes McKinney

Votes:: 1 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 27/Dec/18 17:53

Updated:: 11/Jan/23 07:31

Resolved:: 10/Feb/19 01:05

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

11h 50m