[SPARK-44718] High On-heap memory usage is detected while doing parquet-file reading with Off-Heap memory mode enabled on spark - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.4.1
Fix Version/s: 4.0.0
Component/s: Spark Core, SQL
Labels:
- pull-request-available

Flags:

Patch

Description

I see the high use of on-heap memory usage while doing the parquet file reading when the off-heap memory mode is enabled. This is caused by the memory-mode for the column vector for the vectorized reader is configured by different flag, and the default value is always set to On-Heap.

Conf to reproduce the issue:

spark.memory.offHeap.size 1000000
spark.memory.offHeap.enabled true

Enabling these configurations only will not change the memory mode used for parquet-reading by the vectorized reader to Off-Heap.

Proposed PR: https://github.com/apache/spark/pull/42394

Attachments

Issue Links

links to

[Github] Pull Request #42394 (majdyz)

GitHub Pull Request #47165

Activity

People

Assignee:: Zamil Majdy

Reporter:: Zamil Majdy

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 08/Aug/23 11:55

Updated:: 02/Jul/24 17:06

Resolved:: 15/Aug/23 10:10