[IMPALA-1430] Codegen all aggregate functions, including UDAs - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: Impala 2.0
Fix Version/s: Impala 2.9.0
Component/s: Backend
Labels:

Target Version:

Product Backlog

Description

Currently codegen is disabled for the entire aggregation operator if a single aggregate function can't be codegen'd. We should address this by making it so all aggregate functions can be codegen'd, including UDAs. For UDAs in .so's, the codegen'd function will call into the UDA library. This also affects aggregation operator on timestamp.

This perf hit can be especially bad for COMPUTE STATS which is heavily CPU bound on the aggregation and because there is no easy way to exclude the TIMESTAMP columns when computing the column stats (i.e., there is no simple workaround).

Even if the portions involving TIMESTAMP cannot be codegen'd it would still be worthwhile to come up with a workaround where codegen for the other types is still enabled.

Workaround
If you are experiencing very slow COMPUTE STATS due to this issue, then you may be able to temporarily ALTER the TIMESTAMP columns to STRING or INT type before running COMPUTE STATS. After the command completed, the columns can be altered back to TIMESTAMP.
Note the workaround is only apply to text data, not parquet data. parquet require compatibles data type. TIMESTAMP is INT96, it's not compatible with STRING or BIGINT.

Attachments

Issue Links

is duplicated by

IMPALA-4165 Enable codegen for all UDAs in aggregations

Resolved

is related to

IMPALA-3884 Enable codegen for TIMESTAMP in hash table.

Resolved

Activity

People

Assignee:: Tim Armstrong

Reporter:: Skye Wanderman-Milne

Votes:: 1 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 28/Oct/14 22:43

Updated:: 30/Aug/18 18:46

Resolved:: 15/Feb/17 06:06