Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-4799

Corrupt replica can be prematurely removed from corruptReplicas map

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 2.0.4-alpha
    • Fix Version/s: 2.1.0-beta
    • Component/s: namenode
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      We saw the following sequence of events in a cluster result in losing the most recent genstamp of a block:

      • client is writing to a pipeline of 3
      • the pipeline had nodes fail over some period of time, such that it left 3 old-genstamp replicas on the original three nodes, having recruited 3 new replicas with a later genstamp.
        • so, we have 6 total replicas in the cluster, three with old genstamps on downed nodes, and 3 with the latest genstamp
      • cluster reboots, and the nodes with old genstamps blockReport first. The replicas are correctly added to the corrupt replicas map since they have a too-old genstamp
      • the nodes with the new genstamp block report. When the latest one block reports, chooseExcessReplicates is called and incorrectly decides to remove the three good replicas, leaving only the old-genstamp replicas.

        Attachments

        1. hdfs-4799-unittest.txt
          9 kB
          Todd Lipcon
        2. hdfs-4799.txt
          10 kB
          Todd Lipcon
        3. hdfs-4799.txt
          10 kB
          Todd Lipcon

          Activity

            People

            • Assignee:
              tlipcon Todd Lipcon
              Reporter:
              tlipcon Todd Lipcon
            • Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: