[HDFS-86] Corrupted blocks get deleted but not replicated - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Invalid
Affects Version/s: None
Fix Version/s: None
Component/s: None
Labels:
None

Description

When I test the patch to ~~HADOOP-1345~~ on a two node dfs cluster, I see that dfs correctly delete the corrupted replica and successfully retry reading from the other correct replica, but the block does not get replicated. The block remains with only 1 replica until the next block report comes in.

In my testcase, since the dfs cluster has only 2 datanodes, the target of replication is the same as the target of block invalidation. After poking the logs, I found out that the namenode sent the replication request before the block invalidation request.

This is because the namenode does not invalidate a block well. In FSNamesystem.invalidateBlock, it first puts the invalidate request in a queue and then immediately removes the replica from its state, which triggers the choosing a target for the block. When requests are sent back to the target datanode as a reply to a heartbeat message, the replication requests have higher priority than the invalidate requests.

This problem could be solved if a namenode removes an invalidated replica from its state only after the invalidate request is sent to the datanode.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

blockInvalidate.patch
15/May/07 00:19
3 kB
Hairong Kuang

Activity

People

Assignee:: Hairong Kuang

Reporter:: Hairong Kuang

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 10/May/07 23:24

Updated:: 19/Aug/10 16:57

Resolved:: 19/Aug/10 16:57