Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-1623 High Availability Framework for HDFS NN
  3. HDFS-2603

HA: don't initialize replication queues until entering Active mode

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersConvert to IssueMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Critical
    • Resolution: Duplicate
    • HA branch (HDFS-1623)
    • None
    • ha
    • None

    Description

      As described in the comments of HDFS-1975:

      1) Active NN receives setReplication to drop some file's replication from 3 to 1
      2) It writes OP_SET_REPLICATION to its log, invalidates two replicas, and returns
      3) The DNs report BLOCK_INVALIDATED back to both the ActiveNN and SBNN.
      4) The SBNN hasn't received the OP_SET_REPLICATION yet, so it marks the block as under-replicated.

      In the case of raising replication (eg from 1 to 3) we get the opposite problem: the SBNN marks the block as over-replicated and adds two of the replicas to its invalidation list.

      Generation stamps don't help here, because changing replication level of a block doesn't change its gen-stamp (and it shouldn't). One possible answer is that we need to modify FSNamesystem.isPopulatingReplQueues to return false on the standby, and then when it switches from standby to active, initialize the replication queues only after reading the latest edits... I think that will solve the SET_REPLICATION issue, but not certain if it will solve all the issues in this general class.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            tlipcon Todd Lipcon Assign to me
            tlipcon Todd Lipcon
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment