Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-16064

Determine when to invalidate corrupt replicas based on number of usable replicas

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersStop watchingWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      Seems that https://issues.apache.org/jira/browse/HDFS-721 was resolved as a non-issue under the assumption that if the namenode & a datanode get into an inconsistent state for a given block pipeline, there should be another datanode available to replicate the block to

      While testing datanode decommissioning using "dfs.exclude.hosts", I have encountered a scenario where the decommissioning gets stuck indefinitely

      Below is the progression of events:

      • there are initially 4 datanodes DN1, DN2, DN3, DN4
      • scale-down is started by adding DN1 & DN2 to "dfs.exclude.hosts"
      • HDFS block pipelines on DN1 & DN2 must now be replicated to DN3 & DN4 in order to satisfy their minimum replication factor of 2
      • during this replication process https://issues.apache.org/jira/browse/HDFS-721 is encountered which causes the following inconsistent state:
        • DN3 thinks it has the block pipeline in FINALIZED state
        • the namenode does not think DN3 has the block pipeline
      2021-06-06 10:38:23,604 INFO org.apache.hadoop.hdfs.server.datanode.DataNode (DataXceiver for client  at /DN2:45654 [Receiving block BP-YYY:blk_XXX]): DN3:9866:DataXceiver error processing WRITE_BLOCK operation  src: /DN2:45654 dst: /DN3:9866; org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block BP-YYY:blk_XXX already exists in state FINALIZED and thus cannot be created.
      
      • the replication is attempted again, but:
        • DN4 has the block
        • DN1 and/or DN2 have the block, but don't count towards the minimum replication factor because they are being decommissioned
        • DN3 does not have the block & cannot have the block replicated to it because of HDFS-721
      • the namenode repeatedly tries to replicate the block to DN3 & repeatedly fails, this continues indefinitely
      • therefore DN4 is the only live datanode with the block & the minimum replication factor of 2 cannot be satisfied
      • because the minimum replication factor cannot be satisfied for the block(s) being moved off DN1 & DN2, the datanode decommissioning can never be completed 
      2021-06-06 10:39:10,106 INFO BlockStateChange (DatanodeAdminMonitor-0): Block: blk_XXX, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0, decommissioned replicas: 0, decommissioning replicas: 2, maintenance replicas: 0, live entering maintenance replicas: 0, excess replicas: 0, Is Open File: false, Datanodes having this block: DN1:9866 DN2:9866 DN4:9866 , Current Datanode: DN1:9866, Is current datanode decommissioning: true, Is current datanode entering maintenance: false
      ...
      2021-06-06 10:57:10,105 INFO BlockStateChange (DatanodeAdminMonitor-0): Block: blk_XXX, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0, decommissioned replicas: 0, decommissioning replicas: 2, maintenance replicas: 0, live entering maintenance replicas: 0, excess replicas: 0, Is Open File: false, Datanodes having this block: DN1:9866 DN2:9866 DN4:9866 , Current Datanode: DN2:9866, Is current datanode decommissioning: true, Is current datanode entering maintenance: false
      

      Being stuck in decommissioning state forever is not an intended behavior of DataNode decommissioning

      A few potential solutions:

      • Address the root cause of the problem which is an inconsistent state between namenode & datanode: https://issues.apache.org/jira/browse/HDFS-721
      • Detect when datanode decommissioning is stuck due to lack of available datanodes for satisfying the minimum replication factor, then recover by re-enabling the datanodes being decommissioned

       

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            KevinWikant Kevin Wikant Assign to me
            KevinWikant Kevin Wikant
            Votes:
            0 Vote for this issue
            Watchers:
            10 Stop watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - Not Specified
              Not Specified
              Remaining:
              Remaining Estimate - 0h
              0h
              Logged:
              Time Spent - 2h
              2h

              Slack

                Issue deployment