Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-7926 long-lived daemons for query fragment execution, I/O and caching
  3. HIVE-10617

LLAP: allocator occasionally has a spurious failure to allocate due to "partitioned" locking and has to retry

    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: llap
    • Labels:
      None

      Description

      See HIVE-10482 and the comment in code. Right now this is worked around by retrying.
      Simple case - thread can reserve memory from manager and bounce between checking arena 1 and arena 2 for memory as other threads allocate and deallocate from respective arenas in reverse order, making it look like there's no memory. More importantly this can happen when buddy blocks are split when lots of stuff is allocated.

      This can be solved either with some form of helping (esp. for split case) or by making allocator an "actor" (or set of actors, one per 1-N arenas that they would own), to satisfy alloc requests more deterministically (and also get rid of most sync).

        Attachments

          Activity

            People

            • Assignee:
              sershe Sergey Shelukhin
              Reporter:
              sershe Sergey Shelukhin
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated: