Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-2791

If block report races with closing of file, replica is incorrectly marked corrupt

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.22.0, 0.23.0
    • Fix Version/s: 0.23.1
    • Component/s: datanode, namenode
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      The following sequence of events results in a replica mistakenly marked corrupt:
      1. Pipeline is open with 2 replicas
      2. DN1 generates a block report but is slow in sending to the NN (eg some flaky network). It gets "stuck" right before the block report RPC.
      3. Client closes the file.
      4. DN2 is fast and sends blockReceived to the NN. NN marks the block as COMPLETE
      5. DN1's block report proceeds, and includes the block in an RBW state.
      6. NN incorrectly marks the replica as corrupt, since it is an RBW replica on a COMPLETE block.

        Attachments

        1. hdfs-2791-test.txt
          8 kB
          Todd Lipcon
        2. hdfs-2791.txt
          11 kB
          Todd Lipcon
        3. hdfs-2791.txt
          11 kB
          Todd Lipcon
        4. hdfs-2791.txt
          11 kB
          Todd Lipcon
        5. hdfs-2791.txt
          10 kB
          Eli Collins

          Issue Links

            Activity

              People

              • Assignee:
                tlipcon Todd Lipcon
                Reporter:
                tlipcon Todd Lipcon
              • Votes:
                0 Vote for this issue
                Watchers:
                13 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: