Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-2791

If block report races with closing of file, replica is incorrectly marked corrupt

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.22.0, 0.23.0
    • 0.23.1
    • datanode, namenode
    • None
    • Reviewed

    Description

      The following sequence of events results in a replica mistakenly marked corrupt:
      1. Pipeline is open with 2 replicas
      2. DN1 generates a block report but is slow in sending to the NN (eg some flaky network). It gets "stuck" right before the block report RPC.
      3. Client closes the file.
      4. DN2 is fast and sends blockReceived to the NN. NN marks the block as COMPLETE
      5. DN1's block report proceeds, and includes the block in an RBW state.
      6. NN incorrectly marks the replica as corrupt, since it is an RBW replica on a COMPLETE block.

      Attachments

        1. hdfs-2791-test.txt
          8 kB
          Todd Lipcon
        2. hdfs-2791.txt
          11 kB
          Todd Lipcon
        3. hdfs-2791.txt
          11 kB
          Todd Lipcon
        4. hdfs-2791.txt
          11 kB
          Todd Lipcon
        5. hdfs-2791.txt
          10 kB
          Eli Collins

        Issue Links

          Activity

            People

              tlipcon Todd Lipcon
              tlipcon Todd Lipcon
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: