Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-1073 Simpler model for Namenode's fs Image and edit Logs
  3. HDFS-1994

Fix race conditions when running two rapidly checkpointing 2NNs

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersConvert to IssueMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • Edit log branch (HDFS-1073)
    • namenode
    • None
    • Reviewed

    Description

      HDFS-1984 added the ability to run two secondary namenodes at the same time. However, there were two races I found when stress testing this (by running two NNs each checkpointing in a tight loop with no sleep):
      1) the writing of the seen_txid file was not atomic, so it was at some points reading an empty file
      2) it was possible for two checkpointers to try to take a checkpoint at the same transaction ID, which would cause the two image downloads to collide and fail

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            tlipcon Todd Lipcon Assign to me
            tlipcon Todd Lipcon
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment