Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
Impala 4.1.0, Impala 4.2.0, Impala 4.1.1, Impala 4.1.2, Impala 4.3.0, Impala 4.4.0, Impala 4.4.1
-
None
-
None
-
ghx-label-1
Description
After applying the merge request 'IMPALA-10502: Handle CREATE/DROP events correctly', the alterTableRecoverPartitions method changed from batching the add_partitions calls to invoking addHmsPartitions all at once. However, for tables with a huge number of partitions, this can result in the creation of a huge temporary object, List<Partitions>, leading to OutOfMemory.
In my test environment, where the catalogd JVM Xmx was set to 2GB, running the end-to-end test custom_cluster/test_wide_table_operations.py on a table with 2000 columns and 50,000 partitions during the recover partitions operation caused catalogd to run into a Java heap space OutOfMemoryError.
An analysis of the memory dump using the MemoryAnalyzer revealed that the temporary object contained a massive number of FieldSchema objects (2000 columns * 50,000 partitions), which overwhelmed memory resources.
To resolve this issue, we propose batching the addHmsPartitions calls, ensuring that temporary objects are released after each batch operation. This solution was tested and verified to resolve the OutOfMemoryError, ensuring system stability when handling a large number of partitions.