Varlen data (e.g. strings) produced by aggregations is freed after passing up the output (https://github.com/cloudera/Impala/blob/cdh5-trunk/be/src/exec/partitioned-aggregation-node.cc#L1353). This works fine for streaming operators or blocking operators that copy their input, but results in memory corruption when the output reaches non-copying blocking operators.
Build ASAN, start an impalad with the -disable_mem_pools flag, and run the following query:
I've attached the ASAN output from running this query (asan_output.txt).
If the query plan contains an aggregation node producing string values anywhere within a subplan (i.e. if in the SQL statement, the aggregate function appears within an inline view over a collection column), the results of the aggregation may be incorrect.