Author: Joe McDonnell <email@example.com>
Date: Mon Feb 27 16:13:38 2017 -0800
IMPALA-4899: Fix parquet table writer dictionary leak
Currently, in HdfsTableSink, OutputPartitions are added to the RuntimeState
object pool to be freed at the end of the query. However, for clustered inserts
into a partitioned table, the OutputPartitions are only used one at a time.
They can be immediately freed once done writing to that partition.
In addition, the HdfsParquetTableWriter's ColumnWriters are also added to
this object pool. These constitute a significant amount of memory, as they
contain the dictionaries for Parquet encoding.
This change makes HdfsParquetTableWriter's ColumnWriters use unique_ptrs so
that they are cleaned up when the HdfsParquetTableWriter is deleted. It also
uses a unique_ptr on the PartitionPair for the OutputPartition.
The table writers maintain a pointer to the OutputPartition. This remains a
raw pointer. This is safe, because OutputPartition has a scoped_ptr to the
table writer. The table writer will never outlive the OutputPartition.
Reviewed-by: Marcel Kornacker <firstname.lastname@example.org>
Tested-by: Impala Public Jenkins