[HADOOP-18296] Memory fragmentation in ChecksumFileSystem Vectored IO implementation. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Minor
Resolution: Won't Fix
Affects Version/s: 3.4.0
Fix Version/s: None
Component/s: common
Labels:
- fs

Description

As we have implemented merging of ranges in the ChecksumFSInputChecker implementation of vectored IO api, it can lead to memory fragmentation. Let me explain by example.

Suppose client requests for 3 ranges.

0-500, 700-1000 and 1200-1500.

Now because of merging, all the above ranges will get merged into one and we will allocate a big byte buffer of 0-1500 size but return sliced byte buffers for the desired ranges.

Now once the client is done reading all the ranges, it will only be able to free the memory for requested ranges and memory of the gaps will never be released for eg here (500-700 and 1000-1200).

Note this only happens for direct byte buffers.

Attachments

Issue Links

is related to

SPARK-44116 Utilize Hadoop vectorized APIs

Open

Activity

People

Assignee:: Unassigned

Reporter:: Mukund Thakur

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 16/Jun/22 21:07

Updated:: 15/Apr/24 19:09

Resolved:: 14/Jul/22 20:35