[HIVE-10617] LLAP: allocator occasionally has a spurious failure to allocate due to "partitioned" locking and has to retry - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: llap
Labels:
None

Description

See ~~HIVE-10482~~ and the comment in code. Right now this is worked around by retrying.
Simple case - thread can reserve memory from manager and bounce between checking arena 1 and arena 2 for memory as other threads allocate and deallocate from respective arenas in reverse order, making it look like there's no memory. More importantly this can happen when buddy blocks are split when lots of stuff is allocated.

This can be solved either with some form of helping (esp. for split case) or by making allocator an "actor" (or set of actors, one per 1-N arenas that they would own), to satisfy alloc requests more deterministically (and also get rid of most sync).

Attachments

Activity

People

Assignee:: Sergey Shelukhin

Reporter:: Sergey Shelukhin

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 06/May/15 00:44

Updated:: 18/Nov/15 20:05