[HDFS-3931] TestDatanodeBlockScanner#testBlockCorruptionPolicy2 is broken - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Minor
Resolution: Fixed
Affects Version/s: 2.0.0-alpha
Fix Version/s: 2.0.3-alpha
Component/s: test
Labels:
None

Hadoop Flags:

Reviewed

Description

Per Andy's comment on ~~HDFS-3902~~:

TestDatanodeBlockScanner still fails about 1/5 runs in testBlockCorruptionRecoveryPolicy2. That's due to a separate test issue also uncovered by ~~HDFS-3828~~.
The failure scenario for this one is a bit more tricky. I think I've captured the scenario below:

The test corrupts 2/3 replicas.
client reports a bad block.
NN asks a DN to re-replicate, and randomly picks the other corrupt replica.
DN notices the incoming replica is corrupt and reports it as a bad block, but does not inform the NN that re-replication failed.
NN keeps the block on pendingReplications.
BP scanner wakes up on both DNs with corrupt blocks, both report corruption. NN reports both as duplicates, one from the client and one from the DN report above.
since block is on pendingReplications, NN does not schedule another replication.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

hdfs3931.txt
14/Sep/12 23:55
1 kB
Andy Isaacson
hdfs3931-1.txt
19/Sep/12 02:08
5 kB
Andy Isaacson
hdfs3931-2.txt
19/Sep/12 17:39
3 kB
Andy Isaacson
hdfs3931-3.txt
19/Sep/12 23:31
5 kB
Andy Isaacson

Issue Links

relates to

HDFS-3660 TestDatanodeBlockScanner#testBlockCorruptionRecoveryPolicy2 times out

Resolved

HDFS-3902 TestDatanodeBlockScanner#testBlockCorruptionPolicy is broken

Closed

Activity

People

Assignee:: Andy Isaacson

Reporter:: Eli Collins

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 12/Sep/12 18:38

Updated:: 15/Feb/13 13:11

Resolved:: 27/Sep/12 04:11