Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
Impala 2.0
Description
Currently codegen is disabled for the entire aggregation operator if a single aggregate function can't be codegen'd. We should address this by making it so all aggregate functions can be codegen'd, including UDAs. For UDAs in .so's, the codegen'd function will call into the UDA library. This also affects aggregation operator on timestamp.
This perf hit can be especially bad for COMPUTE STATS which is heavily CPU bound on the aggregation and because there is no easy way to exclude the TIMESTAMP columns when computing the column stats (i.e., there is no simple workaround).
Even if the portions involving TIMESTAMP cannot be codegen'd it would still be worthwhile to come up with a workaround where codegen for the other types is still enabled.
Workaround
If you are experiencing very slow COMPUTE STATS due to this issue, then you may be able to temporarily ALTER the TIMESTAMP columns to STRING or INT type before running COMPUTE STATS. After the command completed, the columns can be altered back to TIMESTAMP.
Note the workaround is only apply to text data, not parquet data. parquet require compatibles data type. TIMESTAMP is INT96, it's not compatible with STRING or BIGINT.
Attachments
Issue Links
- is duplicated by
-
IMPALA-4165 Enable codegen for all UDAs in aggregations
- Resolved
- is related to
-
IMPALA-3884 Enable codegen for TIMESTAMP in hash table.
- Resolved