[HDFS-2791] If block report races with closing of file, replica is incorrectly marked corrupt - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 0.22.0, 0.23.0
Fix Version/s: 0.23.1
Component/s: datanode, namenode
Labels:
None

Target Version/s:

0.23.1
Hadoop Flags:

Reviewed

Description

The following sequence of events results in a replica mistakenly marked corrupt:
1. Pipeline is open with 2 replicas
2. DN1 generates a block report but is slow in sending to the NN (eg some flaky network). It gets "stuck" right before the block report RPC.
3. Client closes the file.
4. DN2 is fast and sends blockReceived to the NN. NN marks the block as COMPLETE
5. DN1's block report proceeds, and includes the block in an RBW state.
6. NN incorrectly marks the replica as corrupt, since it is an RBW replica on a COMPLETE block.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

hdfs-2791.txt
25/Jan/12 22:58
10 kB
Eli Collins
hdfs-2791.txt
24/Jan/12 02:05
11 kB
Todd Lipcon
hdfs-2791.txt
23/Jan/12 21:44
11 kB
Todd Lipcon
hdfs-2791.txt
23/Jan/12 21:12
11 kB
Todd Lipcon
hdfs-2791-test.txt
15/Jan/12 05:38
8 kB
Todd Lipcon

Issue Links

is related to

HDFS-2691 HA: Tests and fixes for pipeline targets and replica recovery

Resolved

Activity

People

Assignee:: Todd Lipcon

Reporter:: Todd Lipcon

Votes:: 0 Vote for this issue

Watchers:: 13 Start watching this issue

Dates

Created:: 15/Jan/12 04:49

Updated:: 10/Mar/15 02:12

Resolved:: 28/Jan/12 00:47