-
Type:
Improvement
-
Status: Open
-
Priority:
Major
-
Resolution: Unresolved
-
Affects Version/s: 1.6.0
-
Fix Version/s: None
-
Component/s: parquet-mr
-
Labels:None
The current code uses direct off-heap buffers for decompression. If many decompressors are instantiated across multiple threads, and/or the objects being decompressed are large, this can lead to a huge amount of off-heap allocation by the JVM. This can be exacerbated if overall, there is not heap contention, since no GC will be performed to reclaim the space used by these buffers.
It would be nice if there was a flag we cold use to simply allocate on-heap buffers here:
We ran into an issue today where these buffers totaled a very large amount of storage and caused our Java processes (running within containers) to be terminated by the kernel OOM-killer.
- breaks
-
SPARK-4073 Parquet+Snappy can cause significant off-heap memory usage
-
- Resolved
-