[HDFS-1262] Failed pipeline creation during append leaves lease hanging on NN - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Critical
Resolution: Won't Fix
Affects Version/s: 0.20-append
Fix Version/s: None
Component/s: hdfs-client, namenode
Labels:
None

Description

Ryan Rawson came upon this nasty bug in HBase cluster testing. What happened was the following:
1) File's original writer died
2) Recovery client tried to open file for append - looped for a minute or so until soft lease expired, then append call initiated recovery
3) Recovery completed successfully
4) Recovery client calls append again, which succeeds on the NN
5) For some reason, the block recovery that happens at the start of append pipeline creation failed on all datanodes 6 times, causing the append() call to throw an exception back to HBase master. HBase assumed the file wasn't open and put it back on a queue to try later
6) Some time later, it tried append again, but the lease was still assigned to the same DFS client, so it wasn't able to recover.

The recovery failure in step 5 is a separate issue, but the problem for this JIRA is that the NN can think it failed to open a file for append when the NN thinks the writer holds a lease. Since the writer keeps renewing its lease, recovery never happens, and no one can open or recover the file until the DFS client shuts down.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

hdfs-1262-1.txt
29/Jun/10 02:06
21 kB
sam rash
hdfs-1262-2.txt
30/Jun/10 09:33
20 kB
sam rash
hdfs-1262-3.txt
22/Jul/10 20:05
20 kB
sam rash
hdfs-1262-4.txt
22/Jul/10 23:37
21 kB
sam rash
hdfs-1262-5.txt
22/Aug/10 16:59
22 kB
sam rash

Activity

People

Assignee:: sam rash

Reporter:: Todd Lipcon

Votes:: 0 Vote for this issue

Watchers:: 11 Start watching this issue

Dates

Created:: 23/Jun/10 07:46

Updated:: 10/Mar/15 03:01

Resolved:: 10/Mar/15 03:01