Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-11960

Successfully closed files can stay under-replicated.

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • None
    • 2.9.0, 3.0.0-alpha4, 2.8.2
    • None
    • None
    • Reviewed

    Description

      If a certain set of conditions hold at the time of a file creation, a block of the file can stay under-replicated. This is because the block is mistakenly taken out of the under-replicated block queue and never gets reevaluated.

      Re-evaluation can be triggered if

      • a replica containing node dies.
      • setrep is called
      • NN repl queues are reinitialized (NN failover or restart)

      If none of these happens, the block stays under-replicated.

      Here is how it happens.
      1) A replica is finalized, but the ACK does not reach the upstream in time. IBR is also delayed.
      2) A close recovery happens, which updates the gen stamp of "healthy" replicas.
      3) The file is closed with the healthy replicas. It is added to the replication queue.
      4) A replication is scheduled, so it is added to the pending replication list. The replication target is picked as the failed node in 1).
      5) The old IBR is finally received for the failed/excluded node. In the meantime, the replication fails, because there is already a finalized replica (with older gen stamp) on the node.
      6) The IBR processing removes the block from the pending list, adds it to corrupt replicas list, and then issues invalidation. Since the block is in neither replication queue nor pending list, it stays under-replicated.

      Attachments

        1. HDFS-11960-v2.branch-2.txt
          4 kB
          Kihwal Lee
        2. HDFS-11960-v2.trunk.txt
          4 kB
          Kihwal Lee
        3. HDFS-11960.patch
          1.0 kB
          Kihwal Lee

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            kihwal Kihwal Lee
            kihwal Kihwal Lee
            Votes:
            0 Vote for this issue
            Watchers:
            14 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment