[SPARK-4366] Aggregation Improvement - ASF JIRA

Attach files

Attach Screenshot

Voters

Watch issue

Watchers

Create sub-task

Link

Clone

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Critical
Resolution: Incomplete
Affects Version/s: None
Fix Version/s: None
Component/s: SQL
Labels:
- bulk-closed

Description

This improvement actually includes couple of sub tasks.

Attachments

aggregatefunction_v1.pdf
06/Jan/15 16:58
523 kB
Cheng Hao

Issue Links

Add Link

blocks

SPARK-8641 Native Spark Window Functions

Resolved

Delete this link

is duplicated by

SPARK-6424 Support user-defined aggregators in AggregateFunction

Closed

Delete this link

links to

[Github] Pull Request #7458 (yhuai)

Delete this link

[Github] Pull Request #7588 (yhuai)

Delete this link

[Github] Pull Request #7619 (cloud-fan)

Delete this link

Sub-Tasks

Create Sub-Task

1.	Simplify the Aggregation Function implementation	Resolved	Cheng Hao	Actions
2.	Partial aggregation support the DISTINCT aggregation	Resolved	Yin Huai	Actions
3.	Sort-based Aggregation	Resolved	Yin Huai	Actions
4.	Support Scala/Java UDAF	Resolved	Yin Huai	Actions
5.	HiveUDAF support for AggregateFunction2	Resolved	Wenchen Fan	Actions
6.	Hybrid aggregate operator using unsafe row	Resolved	Yin Huai	Actions
7.	Supporting multiple DISTINCT columns	Resolved	Herman van Hövell	Actions
8.	Audit both built-in aggregate function and UDAF interface before 1.5.0 release	Resolved	Reynold Xin	Actions
9.	Fix the false negative of Aggregate2Sort and FinalAndCompleteAggregate2Sort's missingInput	Resolved	Yin Huai	Actions
10.	cleanup comments, code style, naming typo for the new aggregation	Resolved	Wenchen Fan	Actions
11.	stddev_pop and stddev_samp aggregate functions	Resolved	Jihong Ma	Actions
12.	variance, var_pop, and var_samp aggregate functions	Resolved	Seth Hendrickson	Actions
13.	covar_pop and covar_samp aggregate functions	Resolved	L. C. Hsieh	Actions
14.	corr aggregate functions	Resolved	L. C. Hsieh	Actions
15.	percentile and percentile_approx aggregate functions	Resolved	Unassigned	Actions
16.	histogram_numeric aggregate function	Resolved	Unassigned	Actions
17.	collect_set and collect_list aggregate functions	Resolved	Nick Buroojy	Actions
18.	UDAF cleanup for 1.5	Resolved	Yin Huai	Actions
19.	Remove the placeholder attributes used in the aggregation buffers	Resolved	Yin Huai	Actions
20.	Refactor new aggregation code to reduce the times of checking compatibility	Resolved	L. C. Hsieh	Actions
21.	Cleanup Hybrid Aggregate Operator.	Resolved	Yin Huai	Actions
22.	Use sqlContext.udf to register UDAFs.	Resolved	Yin Huai	Actions
23.	first/last aggregate NULL behavior	Resolved	Yin Huai	Actions
24.	approx count distinct function	Resolved	Herman van Hövell	Actions
25.	TungstenAggregate should also accept InternalRow instead of just UnsafeRow	Resolved	Yin Huai	Actions
26.	Remove AggregateExpression1 and Aggregate Operator used to evaluate AggregateExpression1s	Resolved	Yin Huai	Actions
27.	The simpleString of TungstenAggregate does not show its output	Resolved	Yin Huai	Actions
28.	Eliminate hash table lookup if there is no grouping key in aggregation.	Resolved	Reynold Xin	Actions
29.	We need to explicitly use transformDown when rewrite aggregation results	Resolved	Josh Rosen	Actions
30.	MutableProjection should evaluate all expressions first and then update the mutable row	Resolved	Davies Liu	Actions
31.	use new aggregate interface for hive UDAF	Resolved	Wenchen Fan	Actions
32.	Partial Aggregation Support for Hive UDAF	Resolved	Cheng Hao	Actions
33.	Better group distinct columns in query compilation	Resolved	Unassigned	Actions
34.	.Refactor AggregateFunction2 and AlgebraicAggregate interfaces to improve code clarity	Resolved	Josh Rosen	Actions
35.	Reduce duplication in Aggregate2's expression rewriting logic	Resolved	Josh Rosen	Actions
36.	Support ImperativeAggregates in TungstenAggregate	Resolved	Josh Rosen	Actions
37.	When planning queries without partial aggregation support, we should try to use TungstenAggregate.	Resolved	Unassigned	Actions
38.	Remove use of KVIterator in SortBasedAggregationIterator	Resolved	Josh Rosen	Actions
39.	Support single distinct count on multiple columns	Resolved	Herman van Hövell	Actions
40.	variance should alias var_samp instead of var_pop	Resolved	Reynold Xin	Actions
41.	Spark SQL SELECT COUNT DISTINCT optimization	Resolved	Yin Huai	Actions
42.	Restore the 1.5's behavior of planning a single distinct aggregation.	Resolved	Yin Huai	Actions
43.	Spark StdDev/Variance defaults are incompatible with Hive	Closed	Unassigned	Actions
44.	Improved multi-column counting	Resolved	Herman van Hövell	Actions

Activity

Comment

This comment will be Viewable by All Users Viewable by All Users

Cancel

People

Assignee:: Unassigned

Reporter:: Cheng Hao

Votes:: 2 Vote for this issue

Watchers:: 23 Start watching this issue

Dates

Created:: 12/Nov/14 18:11

Updated:: 21/May/19 07:15

Resolved:: 21/May/19 07:15

Agile

View on Board

Aggregation Improvement

Details

Description

Attachments

Attachments

Issue Links

Sub-Tasks

Activity

People

Dates

Agile

Slack

Issue deployment