Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
Impala 2.5.0
Description
Varlen data (e.g. strings) produced by aggregations is freed after passing up the output (https://github.com/cloudera/Impala/blob/cdh5-trunk/be/src/exec/partitioned-aggregation-node.cc#L1353). This works fine for streaming operators or blocking operators that copy their input, but results in memory corruption when the output reaches non-copying blocking operators.
Repro
Build ASAN, start an impalad with the -disable_mem_pools flag, and run the following query:
select id, m from functional_parquet.complextypestbl t, (select min(cast(item as string)) m from t.int_array) v
I've attached the ASAN output from running this query (asan_output.txt).
Symptoms
If the query plan contains an aggregation node producing string values anywhere within a subplan (i.e. if in the SQL statement, the aggregate function appears within an inline view over a collection column), the results of the aggregation may be incorrect.