[YARN-5438] TimelineClientImpl leaking FileSystem Instances causing Long running services like HiverServer2 daemon going OOM - ASF JIRA

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.8.0, 2.7.3
Fix Version/s: 2.8.0, 3.0.0-alpha1
Component/s: timelineserver
Labels:
None

Hadoop Flags:

Reviewed

Description

TimelineClientImpl leaking FileSystem Instances causing Long running services like HiverServer2 daemon going OOM

In org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl, FileSystem.newInstance is invoked and is not closed. Causing over time Filesystem instances getting accumulated in long runninh Client (like Hiveserver2), finally causing them to OOM

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

YARN-5438.0.patch
27/Jul/16 16:59
0.9 kB
Rohith Sharma K S

Activity

Ascending order - Click to sort in descending order

Rohith Sharma K S added a comment - 27/Jul/16 16:59

Updated patch for closing the FileSystem while stopping TimelineClient

Rohith Sharma K S added a comment - 27/Jul/16 16:59 Updated patch for closing the FileSystem while stopping TimelineClient

Hadoop QA added a comment - 27/Jul/16 17:24

-1 overall

Vote	Subsystem	Runtime	Comment
0	reexec	0m 15s	Docker mode activated.
+1	@author	0m 0s	The patch does not contain any @author tags.
-1	test4tests	0m 0s	The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
+1	mvninstall	6m 39s	trunk passed
+1	compile	0m 26s	trunk passed
+1	checkstyle	0m 18s	trunk passed
+1	mvnsite	0m 30s	trunk passed
+1	mvneclipse	0m 13s	trunk passed
+1	findbugs	0m 55s	trunk passed
+1	javadoc	0m 28s	trunk passed
+1	mvninstall	0m 28s	the patch passed
+1	compile	0m 23s	the patch passed
+1	javac	0m 23s	the patch passed
+1	checkstyle	0m 15s	the patch passed
+1	mvnsite	0m 26s	the patch passed
+1	mvneclipse	0m 10s	the patch passed
+1	whitespace	0m 0s	The patch has no whitespace issues.
+1	findbugs	1m 0s	the patch passed
+1	javadoc	0m 26s	the patch passed
+1	unit	2m 16s	hadoop-yarn-common in the patch passed.
+1	asflicense	0m 15s	The patch does not generate ASF License warnings.
		15m 59s

Subsystem	Report/Notes
Docker	Image:yetus/hadoop:9560f25
JIRA Patch URL	https://issues.apache.org/jira/secure/attachment/12820503/YARN-5438.0.patch
JIRA Issue	~~YARN-5438~~
Optional Tests	asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
uname	Linux 872a7339fda9 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	/testptch/hadoop/patchprocess/precommit/personality/provided.sh
git revision	trunk / 54fe17a
Default Java	1.8.0_101
findbugs	v3.0.0
Test Results	https://builds.apache.org/job/PreCommit-YARN-Build/12523/testReport/
modules	C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common
Console output	https://builds.apache.org/job/PreCommit-YARN-Build/12523/console
Powered by	Apache Yetus 0.3.0 http://yetus.apache.org

This message was automatically generated.

Hadoop QA added a comment - 27/Jul/16 17:24 -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 15s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 mvninstall 6m 39s trunk passed +1 compile 0m 26s trunk passed +1 checkstyle 0m 18s trunk passed +1 mvnsite 0m 30s trunk passed +1 mvneclipse 0m 13s trunk passed +1 findbugs 0m 55s trunk passed +1 javadoc 0m 28s trunk passed +1 mvninstall 0m 28s the patch passed +1 compile 0m 23s the patch passed +1 javac 0m 23s the patch passed +1 checkstyle 0m 15s the patch passed +1 mvnsite 0m 26s the patch passed +1 mvneclipse 0m 10s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 0s the patch passed +1 javadoc 0m 26s the patch passed +1 unit 2m 16s hadoop-yarn-common in the patch passed. +1 asflicense 0m 15s The patch does not generate ASF License warnings. 15m 59s Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12820503/YARN-5438.0.patch JIRA Issue YARN-5438 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 872a7339fda9 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 54fe17a Default Java 1.8.0_101 findbugs v3.0.0 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/12523/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common Console output https://builds.apache.org/job/PreCommit-YARN-Build/12523/console Powered by Apache Yetus 0.3.0 http://yetus.apache.org This message was automatically generated.

Jason Darrell Lowe added a comment - 27/Jul/16 17:45

Thanks for the patch, Rohith! This probably works for the HiveServer2 case iff the server never tries to use the filesystem after the timeline client is closed. However the timeline client is not just used by HS2, and I think this patch will be problematic for any code that could still use the filesystem after the timeline client is closed. Since the filesystem cache will implicitly link what looks like two separate creations of a filesystem to a single instance, closing one will break any subsequent use of the other.

This makes me think HS2 is missing a closeAllforUGI call in it somewhere to make sure when it's done for a certain user it cleans up all the filesystems associated with that user. It also makes me wonder why we haven't implemented a reference-counting cache for the filesystem by now.

Jason Darrell Lowe added a comment - 27/Jul/16 17:45 Thanks for the patch, Rohith! This probably works for the HiveServer2 case iff the server never tries to use the filesystem after the timeline client is closed. However the timeline client is not just used by HS2, and I think this patch will be problematic for any code that could still use the filesystem after the timeline client is closed. Since the filesystem cache will implicitly link what looks like two separate creations of a filesystem to a single instance, closing one will break any subsequent use of the other. This makes me think HS2 is missing a closeAllforUGI call in it somewhere to make sure when it's done for a certain user it cleans up all the filesystems associated with that user. It also makes me wonder why we haven't implemented a reference-counting cache for the filesystem by now.

Rohith Sharma K S added a comment - 27/Jul/16 18:52

Since the filesystem cache will implicitly link what looks like two separate creations of a filesystem to a single instance, closing one will break any subsequent use of the other.

If the user creates file system object using api FileSystem#newInstance with in the JVM then always new FS object is given. For every newInstance api call, object created using the combination of URI, Conf and UniqueKey. If FS object is created using FS#get then this api search from cache. This API always creates object with combination of URI and CONF only. So mainly it matters how the FS object is being created.
Basically closing one instance which is created using FileSystem#newInstance should not affect other FS object which is created using FS#get. And also note that if two FS objects are created using FS#get then closing one will definitely affect other FS object.

Rohith Sharma K S added a comment - 27/Jul/16 18:52 Since the filesystem cache will implicitly link what looks like two separate creations of a filesystem to a single instance, closing one will break any subsequent use of the other. If the user creates file system object using api FileSystem#newInstance with in the JVM then always new FS object is given. For every newInstance api call, object created using the combination of URI, Conf and UniqueKey . If FS object is created using FS#get then this api search from cache. This API always creates object with combination of URI and CONF only. So mainly it matters how the FS object is being created. Basically closing one instance which is created using FileSystem#newInstance should not affect other FS object which is created using FS#get . And also note that if two FS objects are created using FS#get then closing one will definitely affect other FS object.

Jason Darrell Lowe added a comment - 27/Jul/16 19:17

Ah, thanks Rohith. My bad, I missed that it was creating the filesystem in a way that essentially avoids the cache.

+1 lgtm. Will commit this tomorrow if there are no objections.

Jason Darrell Lowe added a comment - 27/Jul/16 19:17 Ah, thanks Rohith. My bad, I missed that it was creating the filesystem in a way that essentially avoids the cache. +1 lgtm. Will commit this tomorrow if there are no objections.

Jason Darrell Lowe added a comment - 28/Jul/16 21:53

Thanks, Rohith! I committed this to trunk, branch-2, and branch-2.8.

Jason Darrell Lowe added a comment - 28/Jul/16 21:53 Thanks, Rohith! I committed this to trunk, branch-2, and branch-2.8.

Hudson added a comment - 28/Jul/16 22:57

SUCCESS: Integrated in Hadoop-trunk-Commit #10172 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10172/)
~~YARN-5438~~. TimelineClientImpl leaking FileSystem Instances causing Long (jlowe: rev a1890c32c52fed69ec09efad0fccf49ed8c2e21e)

hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/api/impl/FileSystemTimelineWriter.java

Hudson added a comment - 28/Jul/16 22:57 SUCCESS: Integrated in Hadoop-trunk-Commit #10172 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10172/ ) YARN-5438 . TimelineClientImpl leaking FileSystem Instances causing Long (jlowe: rev a1890c32c52fed69ec09efad0fccf49ed8c2e21e) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/api/impl/FileSystemTimelineWriter.java

People

Assignee:: Rohith Sharma K S

Reporter:: Karam Singh

Votes:: 0 Vote for this issue

Watchers:: 9 Start watching this issue

Dates

Created:: 27/Jul/16 16:28

Updated:: 30/Aug/16 00:59

Resolved:: 28/Jul/16 21:53

Hadoop YARN

Details

Description

Attachments

Attachments

Activity

People

Dates