Details
-
Improvement
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
Impala 0.7
-
None
-
None
Description
The parquet table writer uses a lot of memory and this grows linearly with the number of output partitions. We'd like to write large files (~512MB-1GB) and these need to be buffered per partition. If the output has 100 partitions, this is 50GB+ RAM that is required.
The buffering will be resolved if we can write multiple columns to different hdfs files, in which case we don't need to buffer at all.
An alternative solution is to write the cols to local disk and then at the end stitch up the files.