Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-6229

Race condition in failover can cause RetryCache fail to work

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 2.1.0-beta
    • 2.4.1
    • ha
    • None
    • Reviewed

    Description

      Currently when NN failover happens, the old SBN first sets its state to active, then starts the active services (including tailing all the remaining editlog and building a complete retry cache based on the editlog). If a retry request, which has already succeeded in the old ANN (but the client fails to receive the response), comes in between, this retry may still get served by the new ANN but miss the retry cache.

      Attachments

        1. HDFS-6229.000.patch
          5 kB
          Jing Zhao
        2. retrycache-race.patch
          2 kB
          Jing Zhao

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            jingzhao Jing Zhao
            jingzhao Jing Zhao
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment