[SPARK-44239] Free memory allocated by large vectors when vectors are reset - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.5.0
Fix Version/s: 4.0.0
Component/s: SQL
Labels:
None

Description

When spark reads a data file into a WritableColumnVector, the memory allocated by the WritableColumnVectors is not freed until the VectorizedColumnReader completes.

It will save memory allocation time by reusing the allocated array objects. But it also takes up too many unused memory after the current large vector batch has been read.

Add a memory reserve policy for this scenario which will reuse the allocated array object for small column vectors and free the memory for huge column vectors.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

image-2023-06-29-12-58-12-256.png
29/Jun/23 04:58
257 kB
Wan Kun
image-2023-06-29-13-03-15-470.png
29/Jun/23 05:03
124 kB
Wan Kun

Issue Links

links to

[Github] Pull Request #41782 (wankunde)

Activity

People

Assignee:: Wan Kun

Reporter:: Wan Kun

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 29/Jun/23 04:13

Updated:: 30/Aug/23 14:37

Resolved:: 30/Aug/23 14:37