Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-2946

DeadLocks in RMStateStore<->ZKRMStateStore

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 2.7.0
    • Fix Version/s: None
    • Component/s: resourcemanager
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      Found one deadlock in ZKRMStateStore.

      1. Initial stage zkClient is null because of zk disconnected event.
      2. When ZKRMstatestore#runWithCheck() wait(zkSessionTimeout) for zkClient to re establish zookeeper connection either via synconnected or expired event, it is highly possible that any other thred can obtain lock on ZKRMStateStore.this from state machine transition events. This cause Deadlock in ZKRMStateStore.

        Attachments

        1. TestYARN2946.java
          3 kB
          Rohith Sharma K S
        2. RM_BeforeFix_Deadlock_cycle_2.png
          41 kB
          Rohith Sharma K S
        3. RM_BeforeFix_Deadlock_cycle_1.png
          54 kB
          Rohith Sharma K S
        4. 0004-YARN-2946.patch
          38 kB
          Rohith Sharma K S
        5. 0003-YARN-2946.patch
          32 kB
          Rohith Sharma K S
        6. 0003-YARN-2946.patch
          35 kB
          Rohith Sharma K S
        7. 0002-YARN-2946.patch
          3 kB
          Rohith Sharma K S
        8. 0001-YARN-2946.patch
          1 kB
          Rohith Sharma K S
        9. 0001-YARN-2946.patch
          31 kB
          Rohith Sharma K S

          Activity

            People

            • Assignee:
              rohithsharma Rohith Sharma K S
              Reporter:
              rohithsharma Rohith Sharma K S
            • Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: