Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-3931

TestDatanodeBlockScanner#testBlockCorruptionPolicy2 is broken

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 2.0.0-alpha
    • 2.0.3-alpha
    • test
    • None
    • Reviewed

    Description

      Per Andy's comment on HDFS-3902:

      TestDatanodeBlockScanner still fails about 1/5 runs in testBlockCorruptionRecoveryPolicy2. That's due to a separate test issue also uncovered by HDFS-3828.
      The failure scenario for this one is a bit more tricky. I think I've captured the scenario below:

      • The test corrupts 2/3 replicas.
      • client reports a bad block.
      • NN asks a DN to re-replicate, and randomly picks the other corrupt replica.
      • DN notices the incoming replica is corrupt and reports it as a bad block, but does not inform the NN that re-replication failed.
      • NN keeps the block on pendingReplications.
      • BP scanner wakes up on both DNs with corrupt blocks, both report corruption. NN reports both as duplicates, one from the client and one from the DN report above.
        since block is on pendingReplications, NN does not schedule another replication.

      Attachments

        1. hdfs3931-3.txt
          5 kB
          Andy Isaacson
        2. hdfs3931-2.txt
          3 kB
          Andy Isaacson
        3. hdfs3931-1.txt
          5 kB
          Andy Isaacson
        4. hdfs3931.txt
          1 kB
          Andy Isaacson

        Issue Links

          Activity

            People

              adi2 Andy Isaacson
              eli Eli Collins
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: