Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-2946

DeadLocks in RMStateStore<->ZKRMStateStore

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • 2.7.0
    • None
    • resourcemanager
    • None
    • Reviewed

    Description

      Found one deadlock in ZKRMStateStore.

      1. Initial stage zkClient is null because of zk disconnected event.
      2. When ZKRMstatestore#runWithCheck() wait(zkSessionTimeout) for zkClient to re establish zookeeper connection either via synconnected or expired event, it is highly possible that any other thred can obtain lock on ZKRMStateStore.this from state machine transition events. This cause Deadlock in ZKRMStateStore.

      Attachments

        1. 0004-YARN-2946.patch
          38 kB
          Rohith Sharma K S
        2. 0003-YARN-2946.patch
          35 kB
          Rohith Sharma K S
        3. 0003-YARN-2946.patch
          32 kB
          Rohith Sharma K S
        4. 0001-YARN-2946.patch
          31 kB
          Rohith Sharma K S
        5. RM_BeforeFix_Deadlock_cycle_2.png
          41 kB
          Rohith Sharma K S
        6. RM_BeforeFix_Deadlock_cycle_1.png
          54 kB
          Rohith Sharma K S
        7. 0002-YARN-2946.patch
          3 kB
          Rohith Sharma K S
        8. 0001-YARN-2946.patch
          1 kB
          Rohith Sharma K S
        9. TestYARN2946.java
          3 kB
          Rohith Sharma K S

        Activity

          People

            rohithsharma Rohith Sharma K S
            rohithsharma Rohith Sharma K S
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: