Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 2.7.1
    • Fix Version/s: 2.9.0, 3.0.0-alpha4, 2.8.2
    • Component/s: timelineserver
    • Labels:
      None
    • Environment:

      HDP2.4
      CentOS 6.7
      jdk1.8.0_72

      Description

      memory usage of timeline server machine increases gradually.

      https://gyazo.com/952dad96c77ae053bae2e4d8c8ab0572

      please check since April.

      According to my investigation, timeline server used about 25GB.

      top command result

      90577 yarn      20   0 28.4g  25g  12m S  0.0 40.1   5162:53 /usr/java/jdk1.8.0_72/bin/java -Dproc_timelineserver -Xmx1024m -Dhdp.version=2.4.0.0-169 -Dhadoop.log.dir=/var/log/hadoop-yarn/yarn -Dyarn.log.dir=/var/log/hadoop-yarn/yarn ...
      

      ps command result

      $ ps ww 90577
       90577 ?        Sl   5162:53 /usr/java/jdk1.8.0_72/bin/java -Dproc_timelineserver -Xmx1024m -Dhdp.version=2.4.0.0-169 -Dhadoop.log.dir=/var/log/hadoop-yarn/yarn -Dyarn.log.dir=/var/log/hadoop-yarn/yarn -Dhadoop.log.file=yarn-yarn-timelineserver-myhost.log -Dyarn.log.file=yarn-yarn-timelineserver-myhost.log -Dyarn.home.dir= -Dyarn.id.str=yarn -Dhadoop.root.logger=INFO,EWMA,RFA -Dyarn.root.logger=INFO,EWMA,RFA -Djava.library.path=:/usr/hdp/2.4.0.0-169/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.4.0.0-169/hadoop/lib/native:/var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir:/usr/hdp/2.4.0.0-169/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.4.0.0-169/hadoop/lib/native:/var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir -Dyarn.policy.file=hadoop-policy.xml -Djava.io.tmpdir=/var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir -Dhadoop.log.dir=/var/log/hadoop-yarn/yarn -Dyarn.log.dir=/var/log/hadoop-yarn/yarn -Dhadoop.log.file=yarn-yarn-timelineserver-myhost.log -Dyarn.log.file=yarn-yarn-timelineserver-myhost.log -Dyarn.home.dir=/usr/hdp/current/hadoop-yarn-timelineserver -Dhadoop.home.dir=/usr/hdp/2.4.0.0-169/hadoop -Dhadoop.root.logger=INFO,EWMA,RFA -Dyarn.root.logger=INFO,EWMA,RFA -Djava.library.path=:/usr/hdp/2.4.0.0-169/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.4.0.0-169/hadoop/lib/native:/var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir:/usr/hdp/2.4.0.0-169/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.4.0.0-169/hadoop/lib/native:/var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir -classpath /usr/hdp/2.4.0.0-169/hadoop/conf:/usr/hdp/2.4.0.0-169/hadoop/conf:/usr/hdp/2.4.0.0-169/hadoop/conf:/usr/hdp/2.4.0.0-169/hadoop/lib/*:/usr/hdp/2.4.0.0-169/hadoop/.//*:/usr/hdp/2.4.0.0-169/hadoop-hdfs/./:/usr/hdp/2.4.0.0-169/hadoop-hdfs/lib/*:/usr/hdp/2.4.0.0-169/hadoop-hdfs/.//*:/usr/hdp/2.4.0.0-169/hadoop-yarn/lib/*:/usr/hdp/2.4.0.0-169/hadoop-yarn/.//*:/usr/hdp/2.4.0.0-169/hadoop-mapreduce/lib/*:/usr/hdp/2.4.0.0-169/hadoop-mapreduce/.//*::/usr/hdp/2.4.0.0-169/tez/*:/usr/hdp/2.4.0.0-169/tez/lib/*:/usr/hdp/2.4.0.0-169/tez/conf:/usr/hdp/2.4.0.0-169/tez/*:/usr/hdp/2.4.0.0-169/tez/lib/*:/usr/hdp/2.4.0.0-169/tez/conf:/usr/hdp/current/hadoop-yarn-timelineserver/.//*:/usr/hdp/current/hadoop-yarn-timelineserver/lib/*:/usr/hdp/2.4.0.0-169/hadoop/conf/timelineserver-config/log4j.properties org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer
      

      Alghough I set -Xmx1024m, actual memory usage is 25GB.

      After I restart timeline server, memory usage of timeline server machine decreases.

      https://gyazo.com/130600c17a7d41df8606727a859ae7e3

      Now timelineserver uses less than 1GB memory.

      top command result

       6163 yarn      20   0 3959m 783m  46m S  0.3  1.2   3:37.60 /usr/java/jdk1.8.0_72/bin/java -Dproc_timelineserver -Xmx1024m -Dhdp.version=2.4.0.0-169 ...
      

      I suspect memory leak at timeline server.

      1. YARN-5368.1.patch
        11 kB
        Jonathan Eagles
      2. YARN-5368.2.patch
        16 kB
        Jonathan Eagles

        Activity

        Hide
        Naganarasimha Naganarasimha G R added a comment -

        hi Wataru Yukawa,
        5th column of the top output which you are pointing to refers to the Virtual Memory Size which includes all code, data and shared libraries plus pages that have been swapped out and pages that have been mapped but not used. So its not precisely using 25GB of ram.
        Are you facing any particular impact from this and also it would be helpful if you shares how many Open files are present for the timeline service process.

        Show
        Naganarasimha Naganarasimha G R added a comment - hi Wataru Yukawa , 5th column of the top output which you are pointing to refers to the Virtual Memory Size which includes all code, data and shared libraries plus pages that have been swapped out and pages that have been mapped but not used. So its not precisely using 25GB of ram. Are you facing any particular impact from this and also it would be helpful if you shares how many Open files are present for the timeline service process.
        Hide
        wyukawa Wataru Yukawa added a comment -

        Thank you for comment.

        I point out 6th column(RSS), not 5th column VSZ(28.4g) .

        RSS is 25GB.

        >Are you facing any particular impact from this
        now, nothing.
        but I'm afraid memory shortage.

        lsof -p `pgrep -f timeline` is the following.

        https://gist.github.com/wyukawa/36ccfde48f118bb1e8e4f2564d4ecc2a

        thanks

        Show
        wyukawa Wataru Yukawa added a comment - Thank you for comment. I point out 6th column(RSS), not 5th column VSZ(28.4g) . RSS is 25GB. >Are you facing any particular impact from this now, nothing. but I'm afraid memory shortage. lsof -p `pgrep -f timeline` is the following. https://gist.github.com/wyukawa/36ccfde48f118bb1e8e4f2564d4ecc2a thanks
        Hide
        brahmareddy Brahma Reddy Battula added a comment -

        Wataru Yukawa I believe, this issue with level-db with cent-OS 6.7 only.. Did you get any workaround for this..?

        Recently I noticed same issue with NodeManger when recovery is enabled.NM RES is keep on growing which leads ResourceLocalization slow.

        When NM RES Memory more ResourceLocalization took ~3 mins

        2016-10-21 13:48:14,481 | INFO  | LocalizerRunner for container_e08_1476679121221_34954_01_000005 | Writing credentials to the nmPrivate file /srv/BigData/data12/yarn/localdir/nmPrivate/container_e08_1476679121221_34954_01_000005.tokens. Credentials list:  | ResourceLocalizationService.java:1238
         2016-10-21 13:48:14,487  | INFO  | LocalizerRunner for container_e08_1476679121221_34954_01_000006 | Writing credentials to the nmPrivate file /srv/BigData/data5/yarn/localdir/nmPrivate/container_e08_1476679121221_34954_01_000006.tokens. Credentials list:  | ResourceLocalizationService.java:1238
         2016-10-21 13:51:40,382  | INFO  | IPC Server handler 3 on 26007 | Resource hdfs://hacluster/tmp/hadoop-yarn/staging/IOCLG/.staging/job_1476679121221_34954/libjars/hbase-server-1.0.2.jar(->/srv/BigData/data22/yarn/localdir/usercache/IOCLG/filecache/557841/hbase-server-1.0.2.jar) transitioned from DOWNLOADING to LOCALIZED | LocalizedResource.java:203
        

        When normal ResourceLocalization time

        2016-10-21 14:19:05,600 | INFO  | LocalizerRunner for container_e10_1477030404479_0013_01_000006 | Writing credentials to the nmPrivate file /srv/BigData/data6/yarn/localdir/nmPrivate/container_e10_1477030404479_0013_01_000006.tokens. Credentials list:  | ResourceLocalizationService.java:1238
         2016-10-21 14:19:05,600  | INFO  | LocalizerRunner for container_e10_1477030404479_0013_01_000005 | Writing credentials to the nmPrivate file /srv/BigData/data15/yarn/localdir/nmPrivate/container_e10_1477030404479_0013_01_000005.tokens. Credentials list:  | ResourceLocalizationService.java:1238
         2016-10-21 14:19:07,860  | INFO  | IPC Server handler 2 on 26007 | Resource hdfs://hacluster/tmp/hadoop-yarn/staging/IOCLG/.staging/job_1477030404479_0013/libjars/hbase-server-1.0.2.jar(->/srv/BigData/data15/yarn/localdir/usercache/IOCLG/filecache/558308/hbase-server-1.0.2.jar) transitioned from DOWNLOADING to LOCALIZED | LocalizedResource.java:203
        2016-10-21 14:19:07,898 | INFO  | IPC Server handler 3 on 26007 | Resource hdfs://hacluster/tmp/hadoop-yarn/staging/IOCLG/.staging/job_1477030404479_0013/libjars/hbase-client-1.0.2.jar(->/srv/BigData/data19/yarn/localdir/usercache/IOCLG/filecache/558312/hbase-client-1.0.2.jar) transitioned from DOWNLOADING to LOCALIZED | LocalizedResource.java:203
        

        I looked at level-db community and did not find any memory leak issue handled after 1.8 release.

        Jason Lowe any thoughts on this..? Thanks.

        Show
        brahmareddy Brahma Reddy Battula added a comment - Wataru Yukawa I believe, this issue with level-db with cent-OS 6.7 only.. Did you get any workaround for this..? Recently I noticed same issue with NodeManger when recovery is enabled.NM RES is keep on growing which leads ResourceLocalization slow. When NM RES Memory more ResourceLocalization took ~3 mins 2016-10-21 13:48:14,481 | INFO | LocalizerRunner for container_e08_1476679121221_34954_01_000005 | Writing credentials to the nmPrivate file /srv/BigData/data12/yarn/localdir/nmPrivate/container_e08_1476679121221_34954_01_000005.tokens. Credentials list: | ResourceLocalizationService.java:1238 2016-10-21 13:48:14,487 | INFO | LocalizerRunner for container_e08_1476679121221_34954_01_000006 | Writing credentials to the nmPrivate file /srv/BigData/data5/yarn/localdir/nmPrivate/container_e08_1476679121221_34954_01_000006.tokens. Credentials list: | ResourceLocalizationService.java:1238 2016-10-21 13:51:40,382 | INFO | IPC Server handler 3 on 26007 | Resource hdfs://hacluster/tmp/hadoop-yarn/staging/IOCLG/.staging/job_1476679121221_34954/libjars/hbase-server-1.0.2.jar(->/srv/BigData/data22/yarn/localdir/usercache/IOCLG/filecache/557841/hbase-server-1.0.2.jar) transitioned from DOWNLOADING to LOCALIZED | LocalizedResource.java:203 When normal ResourceLocalization time 2016-10-21 14:19:05,600 | INFO | LocalizerRunner for container_e10_1477030404479_0013_01_000006 | Writing credentials to the nmPrivate file /srv/BigData/data6/yarn/localdir/nmPrivate/container_e10_1477030404479_0013_01_000006.tokens. Credentials list: | ResourceLocalizationService.java:1238 2016-10-21 14:19:05,600 | INFO | LocalizerRunner for container_e10_1477030404479_0013_01_000005 | Writing credentials to the nmPrivate file /srv/BigData/data15/yarn/localdir/nmPrivate/container_e10_1477030404479_0013_01_000005.tokens. Credentials list: | ResourceLocalizationService.java:1238 2016-10-21 14:19:07,860 | INFO | IPC Server handler 2 on 26007 | Resource hdfs://hacluster/tmp/hadoop-yarn/staging/IOCLG/.staging/job_1477030404479_0013/libjars/hbase-server-1.0.2.jar(->/srv/BigData/data15/yarn/localdir/usercache/IOCLG/filecache/558308/hbase-server-1.0.2.jar) transitioned from DOWNLOADING to LOCALIZED | LocalizedResource.java:203 2016-10-21 14:19:07,898 | INFO | IPC Server handler 3 on 26007 | Resource hdfs://hacluster/tmp/hadoop-yarn/staging/IOCLG/.staging/job_1477030404479_0013/libjars/hbase-client-1.0.2.jar(->/srv/BigData/data19/yarn/localdir/usercache/IOCLG/filecache/558312/hbase-client-1.0.2.jar) transitioned from DOWNLOADING to LOCALIZED | LocalizedResource.java:203 I looked at level-db community and did not find any memory leak issue handled after 1.8 release. Jason Lowe any thoughts on this..? Thanks.
        Hide
        jlowe Jason Lowe added a comment -

        Recently I noticed same issue with NodeManger when recovery is enabled.NM RES is keep on growing which leads ResourceLocalization slow.

        We have not seen that on our clusters. Three minutes is a really long time. Do you have gc logging enabled for the nodemanager JVM? It would be interesting to know if it was trying to run one or more GC cycles during that time. If it wasn't GC cycles then I'm not sure how increased off-heap memory would directly contribute to slower resource localization unless the machine was near or at the point where it started swapping.

        As for the timeline server memory usage, it looks like the rolling level db instances are starting to pile up, accumulating a lot of off-heap memory. Pinging Jonathan Eagles since I vaguely remember something like this occurring in the past, and there may be a known fix for that issue.

        Show
        jlowe Jason Lowe added a comment - Recently I noticed same issue with NodeManger when recovery is enabled.NM RES is keep on growing which leads ResourceLocalization slow. We have not seen that on our clusters. Three minutes is a really long time. Do you have gc logging enabled for the nodemanager JVM? It would be interesting to know if it was trying to run one or more GC cycles during that time. If it wasn't GC cycles then I'm not sure how increased off-heap memory would directly contribute to slower resource localization unless the machine was near or at the point where it started swapping. As for the timeline server memory usage, it looks like the rolling level db instances are starting to pile up, accumulating a lot of off-heap memory. Pinging Jonathan Eagles since I vaguely remember something like this occurring in the past, and there may be a known fix for that issue.
        Hide
        jeagles Jonathan Eagles added a comment -

        The Rolling Level DB does roll by default every hour (configurable). Memory in leveldb is retained until the file is closed and the amount of memory retained by default is a max (read-buffer-size + write-buffer-size) * 2 per hour. Upon startup, historical leveldb are lazily loaded so they don't incur the memory hit until the data in the hour is referenced. What I am wondering is how long the retention period is set for the data. The default time to live (yarn.timeline-service.ttl-ms) is 7 days. If you want to reduce memory usage drastically

        • reduce the number of leveldb file retained in memory per day: perhaps set yarn.timeline-service.rolling-period to daily (there is only limited support for changing periods)
        • reduce retention time to meet requirements perhaps set yarn.timeline-service.ttl-ms to a lower number
        • reduce the read cache size: yarn.timeline-service.leveldb-timeline-store.read-cache-size perhaps to 4MB (4194304)
        • reduce the write cache size: yarn.timeline-service.leveldb-timeline-store.write-buffer-size perhaps to 4MB (4194304)
        • implement a yarn patch for rolling leveldb LRU cache so that only a fixed number are in memory at a time and the others are purged until needed

        There is also a yarn.timeline-service.leveldb-timeline-store.max-open-files configuration, but I have limited understanding if that is a global setting or a per rolling instance setting.

        Show
        jeagles Jonathan Eagles added a comment - The Rolling Level DB does roll by default every hour (configurable). Memory in leveldb is retained until the file is closed and the amount of memory retained by default is a max (read-buffer-size + write-buffer-size) * 2 per hour. Upon startup, historical leveldb are lazily loaded so they don't incur the memory hit until the data in the hour is referenced. What I am wondering is how long the retention period is set for the data. The default time to live ( yarn.timeline-service.ttl-ms ) is 7 days. If you want to reduce memory usage drastically reduce the number of leveldb file retained in memory per day: perhaps set yarn.timeline-service.rolling-period to daily (there is only limited support for changing periods) reduce retention time to meet requirements perhaps set yarn.timeline-service.ttl-ms to a lower number reduce the read cache size: yarn.timeline-service.leveldb-timeline-store.read-cache-size perhaps to 4MB (4194304) reduce the write cache size: yarn.timeline-service.leveldb-timeline-store.write-buffer-size perhaps to 4MB (4194304) implement a yarn patch for rolling leveldb LRU cache so that only a fixed number are in memory at a time and the others are purged until needed There is also a yarn.timeline-service.leveldb-timeline-store.max-open-files configuration, but I have limited understanding if that is a global setting or a per rolling instance setting.
        Hide
        brahmareddy Brahma Reddy Battula added a comment -

        Jason Lowe and Jonathan Eagles thanks for your inputs.

        Do you have gc logging enabled for the nodemanager JVM? It would be interesting to know if it was trying to run one or more GC cycles during that time. If it wasn't GC cycles then I'm not sure how increased off-heap memory would directly contribute to slower resource localization unless the machine was near or at the point where it started swapping.

        GC looks normal,and after applying the YARN-3491,ResourceLocalization is also normal. But for RES memory increase , need to try with Jonathan Eagles options.

        But I doubt db.compactRange(null, null);, file number is not decreasing after this. if we close and open the db then files are got deleted( not from the memory).

        Show
        brahmareddy Brahma Reddy Battula added a comment - Jason Lowe and Jonathan Eagles thanks for your inputs. Do you have gc logging enabled for the nodemanager JVM? It would be interesting to know if it was trying to run one or more GC cycles during that time. If it wasn't GC cycles then I'm not sure how increased off-heap memory would directly contribute to slower resource localization unless the machine was near or at the point where it started swapping. GC looks normal,and after applying the YARN-3491 ,ResourceLocalization is also normal. But for RES memory increase , need to try with Jonathan Eagles options. But I doubt db.compactRange(null, null); , file number is not decreasing after this. if we close and open the db then files are got deleted( not from the memory).
        Hide
        brahmareddy Brahma Reddy Battula added a comment -

        Forgot to update here.. Upon investigation, there was internal change which caused the leveldb leak....

        Show
        brahmareddy Brahma Reddy Battula added a comment - Forgot to update here.. Upon investigation, there was internal change which caused the leveldb leak.. ..
        Hide
        wyukawa Wataru Yukawa added a comment -

        >Did you get any workaround for this..?
        I restart timeline server.

        Show
        wyukawa Wataru Yukawa added a comment - >Did you get any workaround for this..? I restart timeline server.
        Hide
        Naganarasimha Naganarasimha G R added a comment - - edited

        We cross verified whether similar leak existed in the existing ATS code (level DB iterator was not being closed after usage) but was not able to find anywhere else so hope issue did not reoccur to you !

        Show
        Naganarasimha Naganarasimha G R added a comment - - edited We cross verified whether similar leak existed in the existing ATS code (level DB iterator was not being closed after usage) but was not able to find anywhere else so hope issue did not reoccur to you !
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 18s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
        +1 mvninstall 12m 50s trunk passed
        +1 compile 0m 19s trunk passed
        +1 checkstyle 0m 14s trunk passed
        +1 mvnsite 0m 21s trunk passed
        +1 mvneclipse 0m 13s trunk passed
        +1 findbugs 0m 31s trunk passed
        +1 javadoc 0m 15s trunk passed
        +1 mvninstall 0m 17s the patch passed
        +1 compile 0m 17s the patch passed
        +1 javac 0m 17s the patch passed
        -0 checkstyle 0m 11s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice: The patch generated 1 new + 5 unchanged - 1 fixed = 6 total (was 6)
        +1 mvnsite 0m 18s the patch passed
        +1 mvneclipse 0m 11s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        +1 findbugs 0m 36s the patch passed
        +1 javadoc 0m 12s the patch passed
        -1 unit 2m 46s hadoop-yarn-server-applicationhistoryservice in the patch failed.
        +1 asflicense 0m 16s The patch does not generate ASF License warnings.
        21m 28s



        Reason Tests
        Failed junit tests hadoop.yarn.server.timeline.webapp.TestTimelineWebServices



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:a9ad5d6
        JIRA Issue YARN-5368
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12859755/YARN-5368.1.patch
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux e0e79f287c6a 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / 2841666
        Default Java 1.8.0_121
        findbugs v3.0.0
        checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/15343/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-applicationhistoryservice.txt
        unit https://builds.apache.org/job/PreCommit-YARN-Build/15343/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-applicationhistoryservice.txt
        Test Results https://builds.apache.org/job/PreCommit-YARN-Build/15343/testReport/
        modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice
        Console output https://builds.apache.org/job/PreCommit-YARN-Build/15343/console
        Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 18s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 mvninstall 12m 50s trunk passed +1 compile 0m 19s trunk passed +1 checkstyle 0m 14s trunk passed +1 mvnsite 0m 21s trunk passed +1 mvneclipse 0m 13s trunk passed +1 findbugs 0m 31s trunk passed +1 javadoc 0m 15s trunk passed +1 mvninstall 0m 17s the patch passed +1 compile 0m 17s the patch passed +1 javac 0m 17s the patch passed -0 checkstyle 0m 11s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice: The patch generated 1 new + 5 unchanged - 1 fixed = 6 total (was 6) +1 mvnsite 0m 18s the patch passed +1 mvneclipse 0m 11s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 0m 36s the patch passed +1 javadoc 0m 12s the patch passed -1 unit 2m 46s hadoop-yarn-server-applicationhistoryservice in the patch failed. +1 asflicense 0m 16s The patch does not generate ASF License warnings. 21m 28s Reason Tests Failed junit tests hadoop.yarn.server.timeline.webapp.TestTimelineWebServices Subsystem Report/Notes Docker Image:yetus/hadoop:a9ad5d6 JIRA Issue YARN-5368 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12859755/YARN-5368.1.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux e0e79f287c6a 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 2841666 Default Java 1.8.0_121 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/15343/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-applicationhistoryservice.txt unit https://builds.apache.org/job/PreCommit-YARN-Build/15343/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-applicationhistoryservice.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/15343/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice Console output https://builds.apache.org/job/PreCommit-YARN-Build/15343/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        jeagles Jonathan Eagles added a comment -

        Unit test failure in TestTimelineWebServices is covered by YARN-5934. Checkstyle method length was pre-existing.

        Show
        jeagles Jonathan Eagles added a comment - Unit test failure in TestTimelineWebServices is covered by YARN-5934 . Checkstyle method length was pre-existing.
        Hide
        jeagles Jonathan Eagles added a comment -

        Naganarasimha G R, I have posted a patch that addresses the missing iterator.close() leak. Let me know if you are ok with the try-with-resources approach to take care of this case

        Show
        jeagles Jonathan Eagles added a comment - Naganarasimha G R , I have posted a patch that addresses the missing iterator.close() leak. Let me know if you are ok with the try-with-resources approach to take care of this case
        Hide
        varun_saxena Varun Saxena added a comment -

        Jonathan Eagles, nice catch.
        Closing of only the last iterator in the loop must be the reason for leak.
        The leak we got in NM, albeit due to our private code, was also due to DBIterator not being closed.

        Using try-with-resources approach for DBIterator should be fine.
        How about using try-with-resources for DBIterator elsewhere in the RollingLevelDBTimelineStore class i.e. where it's not used in the loop, just to make the code consistent.

        Show
        varun_saxena Varun Saxena added a comment - Jonathan Eagles , nice catch. Closing of only the last iterator in the loop must be the reason for leak. The leak we got in NM, albeit due to our private code, was also due to DBIterator not being closed. Using try-with-resources approach for DBIterator should be fine. How about using try-with-resources for DBIterator elsewhere in the RollingLevelDBTimelineStore class i.e. where it's not used in the loop, just to make the code consistent.
        Hide
        Naganarasimha Naganarasimha G R added a comment -

        Thanks Jonathan Eagles, Sorry missed your comment earlier. Thanks for identifying the cause.
        Similar to it i was able to find Db Iterator initialized within the loop @ RollingLevelDBTimelineStore.getEntityByTime, ln no: 718 - 819 and closing the last one of the loop only.

        Show
        Naganarasimha Naganarasimha G R added a comment - Thanks Jonathan Eagles , Sorry missed your comment earlier. Thanks for identifying the cause. Similar to it i was able to find Db Iterator initialized within the loop @ RollingLevelDBTimelineStore.getEntityByTime, ln no: 718 - 819 and closing the last one of the loop only.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 18s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
        +1 mvninstall 15m 27s trunk passed
        +1 compile 0m 21s trunk passed
        +1 checkstyle 0m 17s trunk passed
        +1 mvnsite 0m 23s trunk passed
        +1 mvneclipse 0m 15s trunk passed
        +1 findbugs 0m 37s trunk passed
        +1 javadoc 0m 19s trunk passed
        +1 mvninstall 0m 23s the patch passed
        +1 compile 0m 22s the patch passed
        +1 javac 0m 22s the patch passed
        -0 checkstyle 0m 13s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice: The patch generated 1 new + 5 unchanged - 1 fixed = 6 total (was 6)
        +1 mvnsite 0m 22s the patch passed
        +1 mvneclipse 0m 12s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        +1 findbugs 0m 43s the patch passed
        +1 javadoc 0m 14s the patch passed
        +1 unit 3m 8s hadoop-yarn-server-applicationhistoryservice in the patch passed.
        +1 asflicense 0m 19s The patch does not generate ASF License warnings.
        25m 21s



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:a9ad5d6
        JIRA Issue YARN-5368
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12860727/YARN-5368.2.patch
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux 3d33552d8645 3.13.0-105-generic #152-Ubuntu SMP Fri Dec 2 15:37:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / cd014d5
        Default Java 1.8.0_121
        findbugs v3.0.0
        checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/15399/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-applicationhistoryservice.txt
        Test Results https://builds.apache.org/job/PreCommit-YARN-Build/15399/testReport/
        modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice
        Console output https://builds.apache.org/job/PreCommit-YARN-Build/15399/console
        Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 18s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 mvninstall 15m 27s trunk passed +1 compile 0m 21s trunk passed +1 checkstyle 0m 17s trunk passed +1 mvnsite 0m 23s trunk passed +1 mvneclipse 0m 15s trunk passed +1 findbugs 0m 37s trunk passed +1 javadoc 0m 19s trunk passed +1 mvninstall 0m 23s the patch passed +1 compile 0m 22s the patch passed +1 javac 0m 22s the patch passed -0 checkstyle 0m 13s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice: The patch generated 1 new + 5 unchanged - 1 fixed = 6 total (was 6) +1 mvnsite 0m 22s the patch passed +1 mvneclipse 0m 12s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 0m 43s the patch passed +1 javadoc 0m 14s the patch passed +1 unit 3m 8s hadoop-yarn-server-applicationhistoryservice in the patch passed. +1 asflicense 0m 19s The patch does not generate ASF License warnings. 25m 21s Subsystem Report/Notes Docker Image:yetus/hadoop:a9ad5d6 JIRA Issue YARN-5368 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12860727/YARN-5368.2.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 3d33552d8645 3.13.0-105-generic #152-Ubuntu SMP Fri Dec 2 15:37:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / cd014d5 Default Java 1.8.0_121 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/15399/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-applicationhistoryservice.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/15399/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice Console output https://builds.apache.org/job/PreCommit-YARN-Build/15399/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        jeagles Jonathan Eagles added a comment -

        Checkstyle warning is pre-existing. Varun Saxena, tried to make the entire file consistent. There is one case left over that I was not able to find a way to convert easily without method redesign. Let me know what is left for this patch.

        Show
        jeagles Jonathan Eagles added a comment - Checkstyle warning is pre-existing. Varun Saxena , tried to make the entire file consistent. There is one case left over that I was not able to find a way to convert easily without method redesign. Let me know what is left for this patch.
        Hide
        varun_saxena Varun Saxena added a comment - - edited

        Thanks Jonathan Eagles for the updated patch fixing the comments.

        There is one case left over that I was not able to find a way to convert easily without method redesign.

        That should be fine.

        +1. The changes LGTM.
        I will wait for a day and then commit unless there are opposite opinions.

        Show
        varun_saxena Varun Saxena added a comment - - edited Thanks Jonathan Eagles for the updated patch fixing the comments. There is one case left over that I was not able to find a way to convert easily without method redesign. That should be fine. +1. The changes LGTM. I will wait for a day and then commit unless there are opposite opinions.
        Hide
        Naganarasimha Naganarasimha G R added a comment -

        +1. The changes LGTM.

        Show
        Naganarasimha Naganarasimha G R added a comment - +1. The changes LGTM.
        Hide
        jeagles Jonathan Eagles added a comment -

        This patch should apply cleanly to branch-2 and branch-2.8. Let me know if there are difficulties. Since this is an important bug fix, I think it is work going into 2.8 so it can reach 2.8.1.

        Show
        jeagles Jonathan Eagles added a comment - This patch should apply cleanly to branch-2 and branch-2.8. Let me know if there are difficulties. Since this is an important bug fix, I think it is work going into 2.8 so it can reach 2.8.1.
        Hide
        varun_saxena Varun Saxena added a comment -

        Yeah will get this in branch-2.8 too.

        Show
        varun_saxena Varun Saxena added a comment - Yeah will get this in branch-2.8 too.
        Hide
        varun_saxena Varun Saxena added a comment -

        Committed to trunk, branch-2 and branch-2.8
        Thanks Jonathan Eagles for finding the root cause and fixing it.
        Thanks Wataru Yukawa for reporting the issue and Naga for additional review.

        Show
        varun_saxena Varun Saxena added a comment - Committed to trunk, branch-2 and branch-2.8 Thanks Jonathan Eagles for finding the root cause and fixing it. Thanks Wataru Yukawa for reporting the issue and Naga for additional review.
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11483 (See https://builds.apache.org/job/Hadoop-trunk-Commit/11483/)
        YARN-5368. Memory leak in timeline server (Jonathan Eagles via Varun (varunsaxena: rev 01aca54a22c8586d232a8f79fe9977aeb8d09b83)

        • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/RollingLevelDBTimelineStore.java
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11483 (See https://builds.apache.org/job/Hadoop-trunk-Commit/11483/ ) YARN-5368 . Memory leak in timeline server (Jonathan Eagles via Varun (varunsaxena: rev 01aca54a22c8586d232a8f79fe9977aeb8d09b83) (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/RollingLevelDBTimelineStore.java
        Hide
        vinodkv Vinod Kumar Vavilapalli added a comment -

        2.8.1 became a security release. Moving fix-version to 2.8.2 after the fact.

        Show
        vinodkv Vinod Kumar Vavilapalli added a comment - 2.8.1 became a security release. Moving fix-version to 2.8.2 after the fact.

          People

          • Assignee:
            jeagles Jonathan Eagles
            Reporter:
            wyukawa Wataru Yukawa
          • Votes:
            0 Vote for this issue
            Watchers:
            17 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development