Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
Similar to HIVE-4248, Parquet tries to write large very large "row groups". This causes Hive to run out of memory during dynamic partitions when a reducer may have many Parquet files open at a given time.
As such, we should implement a memory manager which ensures that we don't run out of memory due to writing too many row groups within a single JVM.
Attachments
Attachments
Issue Links
- blocks
-
HIVE-8120 Umbrella JIRA tracking Parquet improvements
- Open
- is related to
-
HIVE-11598 Document Configuration for Parquet Files
- Open
- relates to
-
PARQUET-164 Warn when parquet memory manager kicks in
- Resolved
-
HIVE-10149 Shuffle Hive data before storing in Parquet
- Resolved
-
PARQUET-108 Parquet Memory Management in Java
- Resolved
-
PARQUET-177 MemoryManager ensure minimum Column Chunk size
- Resolved
- links to