[HDFS-8344] NameNode doesn't recover lease for files with missing blocks - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Patch Available
Priority: Major
Resolution: Unresolved
Affects Version/s: 2.7.0
Fix Version/s: None
Component/s: namenode
Labels:
None

Release Note:
Allow a configuration to specify the maximum number of recovery attempts for blocks under construction.

Description

I found another(?) instance in which the lease is not recovered. This is reproducible easily on a pseudo-distributed single node cluster

Before you start it helps if you set. This is not necessary, but simply reduces how long you have to wait

      public static final long LEASE_SOFTLIMIT_PERIOD = 30 * 1000;
      public static final long LEASE_HARDLIMIT_PERIOD = 2 * LEASE_SOFTLIMIT_PERIOD;

Client starts to write a file. (could be less than 1 block, but it hflushed so some of the data has landed on the datanodes) (I'm copying the client code I am using. I generate a jar and run it using $ hadoop jar TestHadoop.jar)
Client crashes. (I simulate this by kill -9 the $(hadoop jar TestHadoop.jar) process after it has printed "Wrote to the bufferedWriter"
Shoot the datanode. (Since I ran on a pseudo-distributed cluster, there was only 1)

I believe the lease should be recovered and the block should be marked missing. However this is not happening. The lease is never recovered.
The effect of this bug for us was that nodes could not be decommissioned cleanly. Although we knew that the client had crashed, the Namenode never released the leases (even after restarting the Namenode) (even months afterwards). There are actually several other cases too where we don't consider what happens if ALL the datanodes die while the file is being written, but I am going to punt on that for another time.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

TestHadoop.java
18/Oct/15 15:33
1 kB
Ravi Prakash
HDFS-8344.10.patch
05/Jan/16 01:03
10 kB
Ravi Prakash
HDFS-8344.09.patch
14/Aug/15 23:31
11 kB
Ravi Prakash
HDFS-8344.08.patch
22/Jul/15 14:50
8 kB
Ravi Prakash
HDFS-8344.07.patch
20/Jul/15 20:28
12 kB
Ravi Prakash
HDFS-8344.06.patch
16/Jul/15 18:22
11 kB
Ravi Prakash
HDFS-8344.05.patch
06/Jul/15 16:53
8 kB
Ravi Prakash
HDFS-8344.04.patch
21/May/15 20:42
8 kB
Ravi Prakash
HDFS-8344.03.patch
21/May/15 01:27
8 kB
Ravi Prakash
HDFS-8344.02.patch
11/May/15 20:30
7 kB
Ravi Prakash
HDFS-8344.01.patch
07/May/15 17:31
7 kB
Ravi Prakash

Issue Links

is related to

HDFS-8406 Lease recovery continually failed

Resolved

relates to

HDFS-8498 Blocks can be committed with wrong size

Resolved

HDFS-7342 Lease Recovery doesn't happen some times

Resolved

HDFS-8999 Allow a file to be closed with COMMITTED but not yet COMPLETE blocks.

Resolved

HDFS-4882 Prevent the Namenode's LeaseManager from looping forever in checkLeases

Closed

HDFS-9232 Shouldn't start block recovery if block has no enough replicas

Patch Available

(1 relates to)

Activity

People

Assignee:: Unassigned

Reporter:: Ravi Prakash

Votes:: 0 Vote for this issue

Watchers:: 31 Start watching this issue

Dates

Created:: 07/May/15 17:17

Updated:: 02/Oct/19 17:14