[HDFS-11817] A faulty node can cause a lease leak and NPE on accessing data - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Critical
Resolution: Fixed
Affects Version/s: 2.8.0
Fix Version/s: 2.9.0, 3.0.0-alpha4, 2.8.2
Component/s: None
Labels:
None

Target Version/s:

2.8.1
Hadoop Flags:

Reviewed

Description

When the namenode performs a lease recovery for a failed write, the commitBlockSynchronization() will fail, if none of the new target has sent a received-IBR. At this point, the data is inaccessible, as the namenode will throw a NullPointerException upon getBlockLocations().

The lease recovery will be retried in about an hour by the namenode. If the nodes are faulty (usually when there is only one new target), they may not block report until this point. If this happens, lease recovery throws an AlreadyBeingCreatedException, which causes LeaseManager to simply remove the lease without finalizing the inode.

This results in an inconsistent lease state. The inode stays under-construction, but no more lease recovery is attempted. A manual lease recovery is also not allowed.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

hdfs-11817_supplement.txt
18/May/17 14:39
7 kB
Kihwal Lee
HDFS-11817.branch-2.7.001.patch
19/Apr/18 20:12
15 kB
Wei-Chiu Chuang
HDFS-11817.branch-2.patch
19/May/17 22:35
17 kB
Kihwal Lee
HDFS-11817.v2.branch-2.8.patch
25/May/17 22:40
16 kB
Kihwal Lee
HDFS-11817.v2.branch-2.patch
22/May/17 22:54
16 kB
Kihwal Lee
HDFS-11817.v2.trunk.patch
22/May/17 22:54
13 kB
Kihwal Lee

Issue Links

is duplicated by

HDFS-8406 Lease recovery continually failed

Resolved

is related to

HDFS-13486 Backport HDFS-11817 (A faulty node can cause a lease leak and NPE on accessing data) to branch-2.7

Resolved

Activity

People

Assignee:: Kihwal Lee

Reporter:: Kihwal Lee

Votes:: 0 Vote for this issue

Watchers:: 12 Start watching this issue

Dates

Created:: 12/May/17 14:29

Updated:: 12/May/18 06:42

Resolved:: 25/May/17 22:37