Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-5694

ZKRMStateStore can prevent the transition to standby in branch-2.7 if the ZK node is unreachable

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Reviewed

    Description

      ZKRMStateStore.doStoreMultiWithRetries() holds the lock while trying to talk to ZK. If the connection fails, it will retry while still holding the lock. The retries are intended to be strictly time limited, but in the case that the ZK node is unreachable, the time limit fails, resulting in the thread holding the lock for over an hour. Transitioning the RM to standby requires that same lock, so in exactly the case that the RM should be transitioning to standby, the VerifyActiveStatusThread blocks it from happening.

      Attachments

        1. YARN-5694.001.patch
          1 kB
          Daniel Templeton
        2. YARN-5694.002.patch
          2 kB
          Daniel Templeton
        3. YARN-5694.003.patch
          2 kB
          Daniel Templeton
        4. YARN-5694.004.patch
          12 kB
          Daniel Templeton
        5. YARN-5694.004.patch
          12 kB
          Daniel Templeton
        6. YARN-5694.005.patch
          12 kB
          Daniel Templeton
        7. YARN-5694.006.patch
          12 kB
          Daniel Templeton
        8. YARN-5694.007.patch
          12 kB
          Daniel Templeton
        9. YARN-5694.008.patch
          1 kB
          Daniel Templeton
        10. YARN-5694.branch-2.6.001.patch
          7 kB
          Daniel Templeton
        11. YARN-5694.branch-2.6.002.patch
          7 kB
          Daniel Templeton
        12. YARN-5694.branch-2.7.001.patch
          1 kB
          Daniel Templeton
        13. YARN-5694.branch-2.7.002.patch
          14 kB
          Daniel Templeton
        14. YARN-5694.branch-2.7.004.patch
          8 kB
          Daniel Templeton
        15. YARN-5694.branch-2.7.005.patch
          7 kB
          Daniel Templeton

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            templedf Daniel Templeton
            templedf Daniel Templeton
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment