Uploaded image for project: 'Apache Tez'
  1. Apache Tez
  2. TEZ-2214

FetcherOrderedGrouped can get stuck indefinitely when MergeManager misses memToDiskMerging

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.5.4, 0.6.1
    • Component/s: None
    • Labels:
      None

      Description

      Scenario:

      • commitMemory & usedMemory are beyond their allowed threshold.
      • InMemoryMerge kicks off and is in the process of flushing memory contents to disk
      • As it progresses, it releases memory segments as well (but not yet over).
      • Fetchers who need memory < maxSingleShuffleLimit, get scheduled.
      • If fetchers are fast, this quickly adds up to commitMemory & usedMemory. Since InMemoryMerge is already in progress, this wouldn't trigger another merge().
      • Pretty soon all fetchers would be stalled and get into the following state.
      Thread 9351: (state = BLOCKED)
       - java.lang.Object.wait(long) @bci=0 (Compiled frame; information may be imprecise)
       - java.lang.Object.wait() @bci=2, line=502 (Compiled frame)
       - org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.waitForShuffleToMergeMemory() @bci=17, line=337 (Interpreted frame)
       - org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.run() @bci=34, line=157 (Interpreted frame)
      
      • Even if InMemoryMerger completes, "commitedMem & usedMem" are beyond their threshold and no other fetcher threads (all are in stalled state) are there to release memory. This causes fetchers to wait indefinitely.

        Attachments

        1. TEZ-2214.3.patch
          2 kB
          Rajesh Balamohan
        2. TEZ-2214.2.patch
          3 kB
          Rajesh Balamohan
        3. TEZ-2214.1.patch
          2 kB
          Rajesh Balamohan

          Activity

            People

            • Assignee:
              rajesh.balamohan Rajesh Balamohan
              Reporter:
              rajesh.balamohan Rajesh Balamohan
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: