Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
Impala 2.10.0
-
None
-
ghx-label-3
Description
We should modify the Parquet table sink so that it reserves memory upfront and allocates the bulk of its memory (column buffers, etc) from the buffer pool.
The reservation calculation should be fairly straightforward for clustered inserts or single-partition inserts, but currently the memory consumption of dynamically partitioned non-cluster inserts is not bounded. In that case we would need to flush partitions to disk to free memory.
One possibly tricky edge case is inserting large string values, since the values could be larger than the regular 2MB buffer size.
Attachments
Issue Links
- is related to
-
IMPALA-5293 Turn insert clustering on by default
- Resolved