[HDFS-8855] Webhdfs client leaks active NameNode connections - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 2.8.0, 3.0.0-alpha1
Component/s: webhdfs
Labels:
None

Hadoop Flags:

Reviewed

Description

The attached script simulates a process opening ~50 files via webhdfs and performing random reads. Note that there are at most 50 concurrent reads, and all webhdfs sessions are kept open. Each read is ~64k at a random position.

The script periodically (once per second) shells into the NameNode and produces a summary of the socket states. For my test cluster with 5 nodes, it took ~30 seconds for the NameNode to have ~25000 active connections and fails.

It appears that each request to the webhdfs client is opening a new connection to the NameNode and keeping it open after the request is complete. If the process continues to run, eventually (~30-60 seconds), all of the open connections are closed and the NameNode recovers.

This smells like SoftReference reaping. Are we using SoftReferences in the webhdfs client to cache NameNode connections but never re-using them?

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HDFS-8855.4.patch
09/Sep/15 20:52
18 kB
Xiaobing Zhou
HDFS-8855.3.patch
28/Aug/15 19:12
8 kB
Xiaobing Zhou
HDFS-8855.2.patch
27/Aug/15 22:48
8 kB
Xiaobing Zhou
HDFS-8855.1.patch
18/Aug/15 21:21
3 kB
Xiaobing Zhou
HDFS-8855.009.patch
20/Nov/15 18:59
21 kB
Xiaobing Zhou
HDFS-8855.008.patch
10/Nov/15 23:01
21 kB
Xiaobing Zhou
HDFS-8855.007.patch
29/Sep/15 23:23
20 kB
Xiaobing Zhou
HDFS-8855.006.patch
28/Sep/15 22:23
20 kB
Xiaobing Zhou
HDFS-8855.005.patch
24/Sep/15 17:05
20 kB
Xiaobing Zhou
HDFS_8855.prototype.patch
19/Aug/15 12:59
5 kB
Bob Hansen

Issue Links

is blocked by

HADOOP-12424 Add a function to build unique cache key for Token.

Resolved

is duplicated by

HDFS-9370 TestDataNodeUGIProvider fails intermittently due to non-deterministic cache expiry.

Resolved

is related to

HADOOP-13436 RPC connections are leaking due to not overriding hashCode and equals

Open

Activity

People

Assignee:: Xiaobing Zhou

Reporter:: Bob Hansen

Votes:: 0 Vote for this issue

Watchers:: 22 Start watching this issue

Dates

Created:: 04/Aug/15 18:30

Updated:: 01/Jan/19 08:45

Resolved:: 24/Nov/15 20:59