[HDFS-7235] DataNode#transferBlock should report blocks that don't exist using reportBadBlock - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 2.6.0
Fix Version/s: 2.7.0, 2.6.1, 3.0.0-alpha1
Component/s: datanode, namenode
Labels:
- 2.6.1-candidate

Target Version/s:

2.7.0

Description

When to decommission a DN, the process hangs.

What happens is, when NN chooses a replica as a source to replicate data on the to-be-decommissioned DN to other DNs, it favors choosing this DN to-be-decommissioned as the source of transfer (see BlockManager.java). However, because of the bad disk, the DN would detect the source block to be transfered as invalidBlock with the following logic in FsDatasetImpl.java:

/** Does the block exist and have the given state? */
  private boolean isValid(final ExtendedBlock b, final ReplicaState state) {
    final ReplicaInfo replicaInfo = volumeMap.get(b.getBlockPoolId(), 
        b.getLocalBlock());
    return replicaInfo != null
        && replicaInfo.getState() == state
        && replicaInfo.getBlockFile().exists();
  }

The reason that this method returns false (detecting invalid block) is because the block file doesn't exist due to bad disk in this case.

The key issue we found here is, after DN detects an invalid block for the above reason, it doesn't report the invalid block back to NN, thus NN doesn't know that the block is corrupted, and keeps sending the data transfer request to the same DN to be decommissioned, again and again. This caused an infinite loop, so the decommission process hangs.

Thanks qwertymaniac for reporting the issue and initial analysis.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HDFS-7235.001.patch
11/Oct/14 04:34
11 kB
Yongjun Zhang
HDFS-7235.002.patch
13/Oct/14 23:32
8 kB
Yongjun Zhang
HDFS-7235.003.patch
20/Oct/14 23:00
8 kB
Yongjun Zhang
HDFS-7235.004.patch
23/Oct/14 05:44
21 kB
Yongjun Zhang
HDFS-7235.005.patch
24/Oct/14 04:28
20 kB
Yongjun Zhang
HDFS-7235.006.patch
24/Oct/14 22:24
20 kB
Yongjun Zhang
HDFS-7235.007.patch
28/Oct/14 14:21
20 kB
Yongjun Zhang
HDFS-7235.007.patch
27/Oct/14 22:49
20 kB
Yongjun Zhang

Activity

People

Assignee:: Yongjun Zhang

Reporter:: Yongjun Zhang

Votes:: 0 Vote for this issue

Watchers:: 9 Start watching this issue

Dates

Created:: 11/Oct/14 04:20

Updated:: 30/Aug/16 01:40

Resolved:: 28/Oct/14 23:42