[HDFS-4882] Prevent the Namenode's LeaseManager from looping forever in checkLeases - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Critical
Resolution: Fixed
Affects Version/s: 2.0.0-alpha, 2.5.1
Fix Version/s: 2.6.1, 3.0.0-alpha1
Component/s: hdfs-client, namenode
Labels:
- 2.6.1-candidate

Target Version/s:

2.6.1

Description

Scenario:
1. cluster with 4 DNs
2. the size of the file to be written is a little more than one block
3. write the first block to 3 DNs, DN1->DN2->DN3
4. all the data packets of first block is successfully acked and the client sets the pipeline stage to PIPELINE_CLOSE, but the last packet isn't sent out
5. DN2 and DN3 are down
6. client recovers the pipeline, but no new DN is added to the pipeline because of the current pipeline stage is PIPELINE_CLOSE
7. client continuously writes the last block, and try to close the file after written all the data
8. NN finds that the penultimate block doesn't has enough replica(our dfs.namenode.replication.min=2), and the client's close runs into indefinite loop(HDFS-2936), and at the same time, NN makes the last block's state to COMPLETE
9. shutdown the client
10. the file's lease exceeds hard limit
11. LeaseManager realizes that and begin to do lease recovery by call fsnamesystem.internalReleaseLease()
12. but the last block's state is COMPLETE, and this triggers lease manager's infinite loop and prints massive logs like this:

2013-06-05,17:42:25,695 INFO org.apache.hadoop.hdfs.server.namenode.LeaseManager: Lease [Lease.  Holder: DFSClient_NONMAPREDUCE_-1252656407_1, pendingcreates: 1] has expired hard
 limit
2013-06-05,17:42:25,695 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering lease=[Lease.  Holder: DFSClient_NONMAPREDUCE_-1252656407_1, pendingcreates: 1], src=
/user/h_wuzesheng/test.dat
2013-06-05,17:42:25,695 WARN org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.internalReleaseLease: File = /user/h_wuzesheng/test.dat, block blk_-7028017402720175688_1202597,
lastBLockState=COMPLETE
2013-06-05,17:42:25,695 INFO org.apache.hadoop.hdfs.server.namenode.LeaseManager: Started block recovery for file /user/h_wuzesheng/test.dat lease [Lease.  Holder: DFSClient_NONM
APREDUCE_-1252656407_1, pendingcreates: 1]

(the 3rd line log is a debug log added by us)

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

4882.1.patch
06/Jun/13 11:40
3 kB
Zesheng Wu
4882.patch
07/Jun/13 11:30
1 kB
Zesheng Wu
4882.patch
07/Jun/13 06:18
1 kB
Zesheng Wu
HDFS-4882.1.patch
03/Nov/14 19:13
6 kB
Ravi Prakash
HDFS-4882.2.patch
12/Nov/14 19:04
9 kB
Ravi Prakash
HDFS-4882.3.patch
14/Nov/14 03:35
9 kB
Ravi Prakash
HDFS-4882.4.patch
15/Nov/14 00:21
9 kB
Ravi Prakash
HDFS-4882.5.patch
20/Nov/14 18:00
9 kB
Ravi Prakash
HDFS-4882.6.patch
20/Nov/14 22:12
9 kB
Ravi Prakash
HDFS-4882.7.patch
21/Nov/14 17:21
9 kB
Ravi Prakash
HDFS-4882.patch
30/Oct/14 22:20
4 kB
Ravi Prakash

Issue Links

incorporates

HDFS-7342 Lease Recovery doesn't happen some times

Resolved

is related to

HDFS-8344 NameNode doesn't recover lease for files with missing blocks

Patch Available

relates to

HDFS-7307 Need 'force close'

Resolved

Activity

People

Assignee:: Ravi Prakash

Reporter:: Zesheng Wu

Votes:: 1 Vote for this issue

Watchers:: 27 Start watching this issue

Dates

Created:: 05/Jun/13 11:27

Updated:: 30/Aug/16 01:42

Resolved:: 24/Nov/14 18:56