Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
3.4.1
-
Patch
Description
I see the high use of on-heap memory usage while doing the parquet file reading when the off-heap memory mode is enabled. This is caused by the memory-mode for the column vector for the vectorized reader is configured by different flag, and the default value is always set to On-Heap.
Conf to reproduce the issue:
spark.memory.offHeap.size 1000000
spark.memory.offHeap.enabled true
Enabling these configurations only will not change the memory mode used for parquet-reading by the vectorized reader to Off-Heap.
Proposed PR: https://github.com/apache/spark/pull/42394