Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-4697

NM aggregation thread pool is not bound by limits

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.9.0, 3.0.0-alpha1
    • Component/s: nodemanager
    • Labels:
      None

      Description

      In the LogAggregationService.java we create a threadpool to upload logs from the nodemanager to HDFS if log aggregation is turned on. This is a cached threadpool which based on the javadoc is an ulimited pool of threads.
      In the case that we have had a problem with log aggregation this could cause a problem on restart. The number of threads created at that point could be huge and will put a large load on the NameNode and in worse case could even bring it down due to file descriptor issues.

      1. yarn4697.001.patch
        7 kB
        Haibo Chen
      2. yarn4697.002.patch
        8 kB
        Haibo Chen
      3. yarn4697.003.patch
        9 kB
        Haibo Chen
      4. yarn4697.004.patch
        12 kB
        Haibo Chen

        Issue Links

          Activity

          Hide
          haibochen Haibo Chen added a comment -

          Default value is 100 as defined in YarnConfiguration

          Show
          haibochen Haibo Chen added a comment - Default value is 100 as defined in YarnConfiguration
          Hide
          vvasudev Varun Vasudev added a comment -

          Submitting patch to Jenkins for pre-commit build.

          Show
          vvasudev Varun Vasudev added a comment - Submitting patch to Jenkins for pre-commit build.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 15s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          0 mvndep 0m 25s Maven dependency ordering for branch
          +1 mvninstall 6m 42s trunk passed
          +1 compile 1m 47s trunk passed with JDK v1.8.0_72
          +1 compile 2m 10s trunk passed with JDK v1.7.0_95
          +1 checkstyle 0m 37s trunk passed
          +1 mvnsite 0m 58s trunk passed
          +1 mvneclipse 0m 26s trunk passed
          +1 findbugs 2m 2s trunk passed
          +1 javadoc 0m 57s trunk passed with JDK v1.8.0_72
          +1 javadoc 3m 20s trunk passed with JDK v1.7.0_95
          0 mvndep 0m 11s Maven dependency ordering for patch
          +1 mvninstall 0m 50s the patch passed
          +1 compile 1m 43s the patch passed with JDK v1.8.0_72
          +1 javac 1m 43s the patch passed
          +1 compile 2m 9s the patch passed with JDK v1.7.0_95
          +1 javac 2m 9s the patch passed
          +1 checkstyle 0m 34s the patch passed
          +1 mvnsite 0m 54s the patch passed
          +1 mvneclipse 0m 23s the patch passed
          +1 whitespace 0m 0s Patch has no whitespace issues.
          +1 findbugs 2m 23s the patch passed
          +1 javadoc 0m 52s the patch passed with JDK v1.8.0_72
          +1 javadoc 3m 11s the patch passed with JDK v1.7.0_95
          -1 unit 0m 21s hadoop-yarn-api in the patch failed with JDK v1.8.0_72.
          +1 unit 8m 48s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.8.0_72.
          -1 unit 0m 23s hadoop-yarn-api in the patch failed with JDK v1.7.0_95.
          +1 unit 9m 19s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.7.0_95.
          +1 asflicense 0m 20s Patch does not generate ASF License warnings.
          53m 18s



          Reason Tests
          JDK v1.8.0_72 Failed junit tests hadoop.yarn.conf.TestYarnConfigurationFields
          JDK v1.7.0_95 Failed junit tests hadoop.yarn.conf.TestYarnConfigurationFields



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:0ca8df7
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12788143/yarn4697.001.patch
          JIRA Issue YARN-4697
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux c161cbcb278b 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 453e7e0
          Default Java 1.7.0_95
          Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_72 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95
          findbugs v3.0.0
          unit https://builds.apache.org/job/PreCommit-YARN-Build/10574/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-api-jdk1.8.0_72.txt
          unit https://builds.apache.org/job/PreCommit-YARN-Build/10574/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-api-jdk1.7.0_95.txt
          unit test logs https://builds.apache.org/job/PreCommit-YARN-Build/10574/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-api-jdk1.8.0_72.txt https://builds.apache.org/job/PreCommit-YARN-Build/10574/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-api-jdk1.7.0_95.txt
          JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/10574/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/10574/console
          Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 15s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. 0 mvndep 0m 25s Maven dependency ordering for branch +1 mvninstall 6m 42s trunk passed +1 compile 1m 47s trunk passed with JDK v1.8.0_72 +1 compile 2m 10s trunk passed with JDK v1.7.0_95 +1 checkstyle 0m 37s trunk passed +1 mvnsite 0m 58s trunk passed +1 mvneclipse 0m 26s trunk passed +1 findbugs 2m 2s trunk passed +1 javadoc 0m 57s trunk passed with JDK v1.8.0_72 +1 javadoc 3m 20s trunk passed with JDK v1.7.0_95 0 mvndep 0m 11s Maven dependency ordering for patch +1 mvninstall 0m 50s the patch passed +1 compile 1m 43s the patch passed with JDK v1.8.0_72 +1 javac 1m 43s the patch passed +1 compile 2m 9s the patch passed with JDK v1.7.0_95 +1 javac 2m 9s the patch passed +1 checkstyle 0m 34s the patch passed +1 mvnsite 0m 54s the patch passed +1 mvneclipse 0m 23s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 findbugs 2m 23s the patch passed +1 javadoc 0m 52s the patch passed with JDK v1.8.0_72 +1 javadoc 3m 11s the patch passed with JDK v1.7.0_95 -1 unit 0m 21s hadoop-yarn-api in the patch failed with JDK v1.8.0_72. +1 unit 8m 48s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.8.0_72. -1 unit 0m 23s hadoop-yarn-api in the patch failed with JDK v1.7.0_95. +1 unit 9m 19s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.7.0_95. +1 asflicense 0m 20s Patch does not generate ASF License warnings. 53m 18s Reason Tests JDK v1.8.0_72 Failed junit tests hadoop.yarn.conf.TestYarnConfigurationFields JDK v1.7.0_95 Failed junit tests hadoop.yarn.conf.TestYarnConfigurationFields Subsystem Report/Notes Docker Image:yetus/hadoop:0ca8df7 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12788143/yarn4697.001.patch JIRA Issue YARN-4697 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux c161cbcb278b 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 453e7e0 Default Java 1.7.0_95 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_72 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-YARN-Build/10574/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-api-jdk1.8.0_72.txt unit https://builds.apache.org/job/PreCommit-YARN-Build/10574/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-api-jdk1.7.0_95.txt unit test logs https://builds.apache.org/job/PreCommit-YARN-Build/10574/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-api-jdk1.8.0_72.txt https://builds.apache.org/job/PreCommit-YARN-Build/10574/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-api-jdk1.7.0_95.txt JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/10574/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn Console output https://builds.apache.org/job/PreCommit-YARN-Build/10574/console Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          Naganarasimha Naganarasimha G R added a comment -

          hi Haibo Chen,
          Thanks for working on this patch and yes it would be better to limit the number of threads in the executor service. But few nits/queries in the patch :

          • if YarnConfiguration is modified then better yarn-default.xml is also modified, but in the first place do we require to keep it configurable ? IMO just having a fixed value like 50 should be safe.
          • threadPool can be of default access instead of public , so that only accessible to testcases
          • Dint understand the need of Semaphore, as in the Runnable immediately after semaphore.acquire() we release in the finally block. And even if not i thought we can submit multiple runnables (say 5/10) with a short sleep and check whether number of live threads having thread name as LogAggregationService is only 1 right ?
          Show
          Naganarasimha Naganarasimha G R added a comment - hi Haibo Chen , Thanks for working on this patch and yes it would be better to limit the number of threads in the executor service. But few nits/queries in the patch : if YarnConfiguration is modified then better yarn-default.xml is also modified, but in the first place do we require to keep it configurable ? IMO just having a fixed value like 50 should be safe. threadPool can be of default access instead of public , so that only accessible to testcases Dint understand the need of Semaphore , as in the Runnable immediately after semaphore.acquire() we release in the finally block. And even if not i thought we can submit multiple runnables (say 5/10) with a short sleep and check whether number of live threads having thread name as LogAggregationService is only 1 right ?
          Hide
          haibochen Haibo Chen added a comment -

          Hi Naganarasimha G R,

          Thanks very much for your comments. I have addressed the threadPool accessibility issue and also modified yarn-default.xml to match YarnConfiguration. To answer your other comments:

          1. Yes, 50 should be safe. (The default I set is 100). But maybe sometimes even 50 threads alone for log aggregation is too much resource dedicated? Some users may also want to use more than 50 if they have powerful machines and many yarn applications? If this is configurable, users themselves can decide.

          2. The purpose of the semaphore is to block the threads in the thread pool because the main thread always acquire the semaphore first. Because I set the thread pool size to be 1, once that single thread tries to acquire the semaphore when it executes either of the two runnable, it blocks and the other runnable will not be executed if the thread pool can indeed create only 1 thread. (If another thread is available in the thread pool, there will be another thread blocking on the semaphore, failing the test). The immediate release after acquire in runnable is just to safely release the resource. I'll try to add comments in the test code.

          Show
          haibochen Haibo Chen added a comment - Hi Naganarasimha G R, Thanks very much for your comments. I have addressed the threadPool accessibility issue and also modified yarn-default.xml to match YarnConfiguration. To answer your other comments: 1. Yes, 50 should be safe. (The default I set is 100). But maybe sometimes even 50 threads alone for log aggregation is too much resource dedicated? Some users may also want to use more than 50 if they have powerful machines and many yarn applications? If this is configurable, users themselves can decide. 2. The purpose of the semaphore is to block the threads in the thread pool because the main thread always acquire the semaphore first. Because I set the thread pool size to be 1, once that single thread tries to acquire the semaphore when it executes either of the two runnable, it blocks and the other runnable will not be executed if the thread pool can indeed create only 1 thread. (If another thread is available in the thread pool, there will be another thread blocking on the semaphore, failing the test). The immediate release after acquire in runnable is just to safely release the resource. I'll try to add comments in the test code.
          Hide
          hadoopqa Hadoop QA added a comment -
          +1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 11s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          0 mvndep 0m 11s Maven dependency ordering for branch
          +1 mvninstall 6m 42s trunk passed
          +1 compile 1m 48s trunk passed with JDK v1.8.0_72
          +1 compile 2m 6s trunk passed with JDK v1.7.0_95
          +1 checkstyle 0m 35s trunk passed
          +1 mvnsite 1m 26s trunk passed
          +1 mvneclipse 0m 38s trunk passed
          +1 findbugs 3m 4s trunk passed
          +1 javadoc 1m 22s trunk passed with JDK v1.8.0_72
          +1 javadoc 3m 44s trunk passed with JDK v1.7.0_95
          0 mvndep 0m 10s Maven dependency ordering for patch
          +1 mvninstall 1m 12s the patch passed
          +1 compile 1m 43s the patch passed with JDK v1.8.0_72
          +1 javac 1m 43s the patch passed
          +1 compile 2m 4s the patch passed with JDK v1.7.0_95
          +1 javac 2m 4s the patch passed
          +1 checkstyle 0m 33s the patch passed
          +1 mvnsite 1m 22s the patch passed
          +1 mvneclipse 0m 33s the patch passed
          +1 whitespace 0m 0s Patch has no whitespace issues.
          +1 xml 0m 0s The patch has no ill-formed XML file.
          +1 findbugs 3m 41s the patch passed
          +1 javadoc 1m 17s the patch passed with JDK v1.8.0_72
          +1 javadoc 3m 44s the patch passed with JDK v1.7.0_95
          +1 unit 0m 20s hadoop-yarn-api in the patch passed with JDK v1.8.0_72.
          +1 unit 1m 50s hadoop-yarn-common in the patch passed with JDK v1.8.0_72.
          +1 unit 8m 50s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.8.0_72.
          +1 unit 0m 24s hadoop-yarn-api in the patch passed with JDK v1.7.0_95.
          +1 unit 2m 7s hadoop-yarn-common in the patch passed with JDK v1.7.0_95.
          +1 unit 9m 12s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.7.0_95.
          +1 asflicense 0m 20s Patch does not generate ASF License warnings.
          62m 33s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:0ca8df7
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12788275/yarn4697.002.patch
          JIRA Issue YARN-4697
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle xml
          uname Linux 263f7c144042 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 5d1889a
          Default Java 1.7.0_95
          Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_72 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95
          findbugs v3.0.0
          JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/10580/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/10580/console
          Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 reexec 0m 11s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. 0 mvndep 0m 11s Maven dependency ordering for branch +1 mvninstall 6m 42s trunk passed +1 compile 1m 48s trunk passed with JDK v1.8.0_72 +1 compile 2m 6s trunk passed with JDK v1.7.0_95 +1 checkstyle 0m 35s trunk passed +1 mvnsite 1m 26s trunk passed +1 mvneclipse 0m 38s trunk passed +1 findbugs 3m 4s trunk passed +1 javadoc 1m 22s trunk passed with JDK v1.8.0_72 +1 javadoc 3m 44s trunk passed with JDK v1.7.0_95 0 mvndep 0m 10s Maven dependency ordering for patch +1 mvninstall 1m 12s the patch passed +1 compile 1m 43s the patch passed with JDK v1.8.0_72 +1 javac 1m 43s the patch passed +1 compile 2m 4s the patch passed with JDK v1.7.0_95 +1 javac 2m 4s the patch passed +1 checkstyle 0m 33s the patch passed +1 mvnsite 1m 22s the patch passed +1 mvneclipse 0m 33s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 xml 0m 0s The patch has no ill-formed XML file. +1 findbugs 3m 41s the patch passed +1 javadoc 1m 17s the patch passed with JDK v1.8.0_72 +1 javadoc 3m 44s the patch passed with JDK v1.7.0_95 +1 unit 0m 20s hadoop-yarn-api in the patch passed with JDK v1.8.0_72. +1 unit 1m 50s hadoop-yarn-common in the patch passed with JDK v1.8.0_72. +1 unit 8m 50s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.8.0_72. +1 unit 0m 24s hadoop-yarn-api in the patch passed with JDK v1.7.0_95. +1 unit 2m 7s hadoop-yarn-common in the patch passed with JDK v1.7.0_95. +1 unit 9m 12s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.7.0_95. +1 asflicense 0m 20s Patch does not generate ASF License warnings. 62m 33s Subsystem Report/Notes Docker Image:yetus/hadoop:0ca8df7 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12788275/yarn4697.002.patch JIRA Issue YARN-4697 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle xml uname Linux 263f7c144042 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 5d1889a Default Java 1.7.0_95 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_72 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95 findbugs v3.0.0 JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/10580/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn Console output https://builds.apache.org/job/PreCommit-YARN-Build/10580/console Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          wilfreds Wilfred Spiegelenburg added a comment -

          Haibo Chen should we do some range checking on the setting for the pool size? Out of range values can cause runtime exceptions and that could leave us with a NM that can not start.

          • Setting the thread number config to 0 the creation of the pool will fail
          • Anything that is NaN or too large the conf.getInt() will throw

          Invalid values should cause us to use the default value.

          Show
          wilfreds Wilfred Spiegelenburg added a comment - Haibo Chen should we do some range checking on the setting for the pool size? Out of range values can cause runtime exceptions and that could leave us with a NM that can not start. Setting the thread number config to 0 the creation of the pool will fail Anything that is NaN or too large the conf.getInt() will throw Invalid values should cause us to use the default value.
          Hide
          rkanter Robert Kanter added a comment -

          Besides Wilfred Spiegelenburg's comments, I have some feedback on the unit test:

          • We should use more than 1 thread in the thread pool because 1 of something can sometimes hide problems. Something like 3 would be better.
          • In case something goes wrong, it would be good to:
            • add a timeout to the test @Test(timeout=30000)
            • make the threads not block indefinitely. That can be done by using tryAcquire(long timeout, TimeUnit unit) instead of just acquire(). If you make the timeout for the threads longer than the timeout for the test itself, you won't have to worry about any timing problems with the thread exiting early, while still preventing the threads from possibly hanging forever
          • The way you're searching for threads is okay, but it would be better if we could get them directly from the thread pool. I see that LogAggregationService only exposes an ExecutorService for the thread pool, but looking at how it's made, I believe it's actually a ThreadPoolExecutor underneath. Can you try casting to ThreadPoolExecutor and see if that works? ThreadPoolExecutor has methods to check how many threads are running etc. If that doesn't work, then I'm okay with the current approach.
          Show
          rkanter Robert Kanter added a comment - Besides Wilfred Spiegelenburg 's comments, I have some feedback on the unit test: We should use more than 1 thread in the thread pool because 1 of something can sometimes hide problems. Something like 3 would be better. In case something goes wrong, it would be good to: add a timeout to the test @Test(timeout=30000) make the threads not block indefinitely. That can be done by using tryAcquire(long timeout, TimeUnit unit) instead of just acquire() . If you make the timeout for the threads longer than the timeout for the test itself, you won't have to worry about any timing problems with the thread exiting early, while still preventing the threads from possibly hanging forever The way you're searching for threads is okay, but it would be better if we could get them directly from the thread pool. I see that LogAggregationService only exposes an ExecutorService for the thread pool, but looking at how it's made, I believe it's actually a ThreadPoolExecutor underneath. Can you try casting to ThreadPoolExecutor and see if that works? ThreadPoolExecutor has methods to check how many threads are running etc. If that doesn't work, then I'm okay with the current approach.
          Hide
          haibochen Haibo Chen added a comment -

          Thanks very much for you guys' comments. I have updated the patch accordingly.

          Show
          haibochen Haibo Chen added a comment - Thanks very much for you guys' comments. I have updated the patch accordingly.
          Hide
          rkanter Robert Kanter added a comment -

          Looks good overall. A few minor things:

          • Add a unit test for NaN and negative values for NM_LOG_AGGREGATION_THREAD_POOL_SIZE to verify it uses the default value.
          • I thought about the unit test some more, and because the Semaphore has a count of 1, if the test behaves properly long before the timeout occurs, each thread will still have to sit through the others' acquiring the semaphore (or the timeout). It would be faster if we use a ReadWriteLock instead, with the test acquiring write lock and the threads acquiring read locks. This way, the threads won't block each other.
          • There should be a try-finally with the acquire/release in the test.
          Show
          rkanter Robert Kanter added a comment - Looks good overall. A few minor things: Add a unit test for NaN and negative values for NM_LOG_AGGREGATION_THREAD_POOL_SIZE to verify it uses the default value. I thought about the unit test some more, and because the Semaphore has a count of 1, if the test behaves properly long before the timeout occurs, each thread will still have to sit through the others' acquiring the semaphore (or the timeout). It would be faster if we use a ReadWriteLock instead, with the test acquiring write lock and the threads acquiring read locks. This way, the threads won't block each other. There should be a try-finally with the acquire/release in the test.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 13m 13s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          0 mvndep 0m 21s Maven dependency ordering for branch
          +1 mvninstall 6m 48s trunk passed
          +1 compile 1m 46s trunk passed with JDK v1.8.0_72
          +1 compile 2m 3s trunk passed with JDK v1.7.0_95
          +1 checkstyle 0m 37s trunk passed
          +1 mvnsite 1m 25s trunk passed
          +1 mvneclipse 0m 38s trunk passed
          +1 findbugs 3m 6s trunk passed
          +1 javadoc 1m 21s trunk passed with JDK v1.8.0_72
          +1 javadoc 3m 47s trunk passed with JDK v1.7.0_95
          0 mvndep 0m 14s Maven dependency ordering for patch
          +1 mvninstall 1m 12s the patch passed
          +1 compile 1m 42s the patch passed with JDK v1.8.0_72
          +1 javac 1m 42s the patch passed
          +1 compile 2m 2s the patch passed with JDK v1.7.0_95
          +1 javac 2m 2s the patch passed
          -1 checkstyle 0m 34s hadoop-yarn-project/hadoop-yarn: patch generated 6 new + 233 unchanged - 0 fixed = 239 total (was 233)
          +1 mvnsite 1m 21s the patch passed
          +1 mvneclipse 0m 31s the patch passed
          +1 whitespace 0m 0s Patch has no whitespace issues.
          +1 xml 0m 0s The patch has no ill-formed XML file.
          +1 findbugs 3m 43s the patch passed
          +1 javadoc 1m 18s the patch passed with JDK v1.8.0_72
          +1 javadoc 3m 43s the patch passed with JDK v1.7.0_95
          +1 unit 0m 20s hadoop-yarn-api in the patch passed with JDK v1.8.0_72.
          +1 unit 1m 52s hadoop-yarn-common in the patch passed with JDK v1.8.0_72.
          +1 unit 8m 36s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.8.0_72.
          +1 unit 0m 23s hadoop-yarn-api in the patch passed with JDK v1.7.0_95.
          +1 unit 2m 11s hadoop-yarn-common in the patch passed with JDK v1.7.0_95.
          +1 unit 9m 7s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.7.0_95.
          +1 asflicense 0m 20s Patch does not generate ASF License warnings.
          75m 42s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:0ca8df7
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12788798/yarn4697.003.patch
          JIRA Issue YARN-4697
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle xml
          uname Linux 9fbe7ebd1793 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 66289a3
          Default Java 1.7.0_95
          Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_72 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/10606/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn.txt
          JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/10606/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/10606/console
          Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 13m 13s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. 0 mvndep 0m 21s Maven dependency ordering for branch +1 mvninstall 6m 48s trunk passed +1 compile 1m 46s trunk passed with JDK v1.8.0_72 +1 compile 2m 3s trunk passed with JDK v1.7.0_95 +1 checkstyle 0m 37s trunk passed +1 mvnsite 1m 25s trunk passed +1 mvneclipse 0m 38s trunk passed +1 findbugs 3m 6s trunk passed +1 javadoc 1m 21s trunk passed with JDK v1.8.0_72 +1 javadoc 3m 47s trunk passed with JDK v1.7.0_95 0 mvndep 0m 14s Maven dependency ordering for patch +1 mvninstall 1m 12s the patch passed +1 compile 1m 42s the patch passed with JDK v1.8.0_72 +1 javac 1m 42s the patch passed +1 compile 2m 2s the patch passed with JDK v1.7.0_95 +1 javac 2m 2s the patch passed -1 checkstyle 0m 34s hadoop-yarn-project/hadoop-yarn: patch generated 6 new + 233 unchanged - 0 fixed = 239 total (was 233) +1 mvnsite 1m 21s the patch passed +1 mvneclipse 0m 31s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 xml 0m 0s The patch has no ill-formed XML file. +1 findbugs 3m 43s the patch passed +1 javadoc 1m 18s the patch passed with JDK v1.8.0_72 +1 javadoc 3m 43s the patch passed with JDK v1.7.0_95 +1 unit 0m 20s hadoop-yarn-api in the patch passed with JDK v1.8.0_72. +1 unit 1m 52s hadoop-yarn-common in the patch passed with JDK v1.8.0_72. +1 unit 8m 36s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.8.0_72. +1 unit 0m 23s hadoop-yarn-api in the patch passed with JDK v1.7.0_95. +1 unit 2m 11s hadoop-yarn-common in the patch passed with JDK v1.7.0_95. +1 unit 9m 7s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.7.0_95. +1 asflicense 0m 20s Patch does not generate ASF License warnings. 75m 42s Subsystem Report/Notes Docker Image:yetus/hadoop:0ca8df7 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12788798/yarn4697.003.patch JIRA Issue YARN-4697 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle xml uname Linux 9fbe7ebd1793 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 66289a3 Default Java 1.7.0_95 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_72 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/10606/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn.txt JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/10606/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn Console output https://builds.apache.org/job/PreCommit-YARN-Build/10606/console Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          haibochen Haibo Chen added a comment -

          New unit tests added for invalid values. Other comments addressed as well

          Show
          haibochen Haibo Chen added a comment - New unit tests added for invalid values. Other comments addressed as well
          Hide
          hadoopqa Hadoop QA added a comment -
          +1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 17s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          0 mvndep 0m 14s Maven dependency ordering for branch
          +1 mvninstall 7m 10s trunk passed
          +1 compile 1m 54s trunk passed with JDK v1.8.0_72
          +1 compile 2m 9s trunk passed with JDK v1.7.0_95
          +1 checkstyle 0m 40s trunk passed
          +1 mvnsite 1m 32s trunk passed
          +1 mvneclipse 0m 44s trunk passed
          +1 findbugs 3m 12s trunk passed
          +1 javadoc 1m 24s trunk passed with JDK v1.8.0_72
          +1 javadoc 3m 50s trunk passed with JDK v1.7.0_95
          0 mvndep 0m 11s Maven dependency ordering for patch
          +1 mvninstall 1m 12s the patch passed
          +1 compile 1m 41s the patch passed with JDK v1.8.0_72
          +1 javac 1m 41s the patch passed
          +1 compile 2m 2s the patch passed with JDK v1.7.0_95
          +1 javac 2m 2s the patch passed
          +1 checkstyle 0m 33s the patch passed
          +1 mvnsite 1m 21s the patch passed
          +1 mvneclipse 0m 33s the patch passed
          +1 whitespace 0m 0s Patch has no whitespace issues.
          +1 xml 0m 0s The patch has no ill-formed XML file.
          +1 findbugs 3m 39s the patch passed
          +1 javadoc 1m 15s the patch passed with JDK v1.8.0_72
          +1 javadoc 3m 43s the patch passed with JDK v1.7.0_95
          +1 unit 0m 20s hadoop-yarn-api in the patch passed with JDK v1.8.0_72.
          +1 unit 1m 51s hadoop-yarn-common in the patch passed with JDK v1.8.0_72.
          +1 unit 8m 40s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.8.0_72.
          +1 unit 0m 22s hadoop-yarn-api in the patch passed with JDK v1.7.0_95.
          +1 unit 2m 10s hadoop-yarn-common in the patch passed with JDK v1.7.0_95.
          +1 unit 9m 26s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.7.0_95.
          +1 asflicense 0m 21s Patch does not generate ASF License warnings.
          63m 51s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:0ca8df7
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12789242/yarn4697.004.patch
          JIRA Issue YARN-4697
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle xml
          uname Linux fddcdd93cde3 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 9ed17f1
          Default Java 1.7.0_95
          Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_72 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95
          findbugs v3.0.0
          JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/10608/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/10608/console
          Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 reexec 0m 17s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. 0 mvndep 0m 14s Maven dependency ordering for branch +1 mvninstall 7m 10s trunk passed +1 compile 1m 54s trunk passed with JDK v1.8.0_72 +1 compile 2m 9s trunk passed with JDK v1.7.0_95 +1 checkstyle 0m 40s trunk passed +1 mvnsite 1m 32s trunk passed +1 mvneclipse 0m 44s trunk passed +1 findbugs 3m 12s trunk passed +1 javadoc 1m 24s trunk passed with JDK v1.8.0_72 +1 javadoc 3m 50s trunk passed with JDK v1.7.0_95 0 mvndep 0m 11s Maven dependency ordering for patch +1 mvninstall 1m 12s the patch passed +1 compile 1m 41s the patch passed with JDK v1.8.0_72 +1 javac 1m 41s the patch passed +1 compile 2m 2s the patch passed with JDK v1.7.0_95 +1 javac 2m 2s the patch passed +1 checkstyle 0m 33s the patch passed +1 mvnsite 1m 21s the patch passed +1 mvneclipse 0m 33s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 xml 0m 0s The patch has no ill-formed XML file. +1 findbugs 3m 39s the patch passed +1 javadoc 1m 15s the patch passed with JDK v1.8.0_72 +1 javadoc 3m 43s the patch passed with JDK v1.7.0_95 +1 unit 0m 20s hadoop-yarn-api in the patch passed with JDK v1.8.0_72. +1 unit 1m 51s hadoop-yarn-common in the patch passed with JDK v1.8.0_72. +1 unit 8m 40s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.8.0_72. +1 unit 0m 22s hadoop-yarn-api in the patch passed with JDK v1.7.0_95. +1 unit 2m 10s hadoop-yarn-common in the patch passed with JDK v1.7.0_95. +1 unit 9m 26s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.7.0_95. +1 asflicense 0m 21s Patch does not generate ASF License warnings. 63m 51s Subsystem Report/Notes Docker Image:yetus/hadoop:0ca8df7 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12789242/yarn4697.004.patch JIRA Issue YARN-4697 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle xml uname Linux fddcdd93cde3 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 9ed17f1 Default Java 1.7.0_95 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_72 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95 findbugs v3.0.0 JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/10608/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn Console output https://builds.apache.org/job/PreCommit-YARN-Build/10608/console Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          rkanter Robert Kanter added a comment -

          LGTM. +1 pending Jenkins (which I just kicked off because it didn't seem to notice for some reason...)
          Will commit tomorrow unless anyone has any other comments.

          Show
          rkanter Robert Kanter added a comment - LGTM. +1 pending Jenkins (which I just kicked off because it didn't seem to notice for some reason...) Will commit tomorrow unless anyone has any other comments.
          Hide
          rkanter Robert Kanter added a comment -

          I guess I just missed that build and hadn't refreshed. I've canceled the extra one I just kicked off

          Show
          rkanter Robert Kanter added a comment - I guess I just missed that build and hadn't refreshed. I've canceled the extra one I just kicked off
          Hide
          hadoopqa Hadoop QA added a comment -
          +1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 13s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          0 mvndep 0m 36s Maven dependency ordering for branch
          +1 mvninstall 6m 37s trunk passed
          +1 compile 1m 42s trunk passed with JDK v1.8.0_72
          +1 compile 2m 3s trunk passed with JDK v1.7.0_95
          +1 checkstyle 0m 36s trunk passed
          +1 mvnsite 1m 26s trunk passed
          +1 mvneclipse 0m 39s trunk passed
          +1 findbugs 3m 5s trunk passed
          +1 javadoc 1m 21s trunk passed with JDK v1.8.0_72
          +1 javadoc 3m 43s trunk passed with JDK v1.7.0_95
          0 mvndep 0m 11s Maven dependency ordering for patch
          +1 mvninstall 1m 13s the patch passed
          +1 compile 1m 41s the patch passed with JDK v1.8.0_72
          +1 javac 1m 41s the patch passed
          +1 compile 2m 10s the patch passed with JDK v1.7.0_95
          +1 javac 2m 10s the patch passed
          +1 checkstyle 0m 33s the patch passed
          +1 mvnsite 1m 21s the patch passed
          +1 mvneclipse 0m 33s the patch passed
          +1 whitespace 0m 0s Patch has no whitespace issues.
          +1 xml 0m 1s The patch has no ill-formed XML file.
          +1 findbugs 3m 44s the patch passed
          +1 javadoc 1m 23s the patch passed with JDK v1.8.0_72
          +1 javadoc 3m 47s the patch passed with JDK v1.7.0_95
          +1 unit 0m 22s hadoop-yarn-api in the patch passed with JDK v1.8.0_72.
          +1 unit 2m 0s hadoop-yarn-common in the patch passed with JDK v1.8.0_72.
          +1 unit 9m 4s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.8.0_72.
          +1 unit 0m 24s hadoop-yarn-api in the patch passed with JDK v1.7.0_95.
          +1 unit 2m 16s hadoop-yarn-common in the patch passed with JDK v1.7.0_95.
          +1 unit 9m 29s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.7.0_95.
          +1 asflicense 0m 21s Patch does not generate ASF License warnings.
          63m 54s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:0ca8df7
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12789242/yarn4697.004.patch
          JIRA Issue YARN-4697
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle xml
          uname Linux 608652843e9a 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 9ed17f1
          Default Java 1.7.0_95
          Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_72 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95
          findbugs v3.0.0
          JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/10613/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/10613/console
          Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 reexec 0m 13s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. 0 mvndep 0m 36s Maven dependency ordering for branch +1 mvninstall 6m 37s trunk passed +1 compile 1m 42s trunk passed with JDK v1.8.0_72 +1 compile 2m 3s trunk passed with JDK v1.7.0_95 +1 checkstyle 0m 36s trunk passed +1 mvnsite 1m 26s trunk passed +1 mvneclipse 0m 39s trunk passed +1 findbugs 3m 5s trunk passed +1 javadoc 1m 21s trunk passed with JDK v1.8.0_72 +1 javadoc 3m 43s trunk passed with JDK v1.7.0_95 0 mvndep 0m 11s Maven dependency ordering for patch +1 mvninstall 1m 13s the patch passed +1 compile 1m 41s the patch passed with JDK v1.8.0_72 +1 javac 1m 41s the patch passed +1 compile 2m 10s the patch passed with JDK v1.7.0_95 +1 javac 2m 10s the patch passed +1 checkstyle 0m 33s the patch passed +1 mvnsite 1m 21s the patch passed +1 mvneclipse 0m 33s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 xml 0m 1s The patch has no ill-formed XML file. +1 findbugs 3m 44s the patch passed +1 javadoc 1m 23s the patch passed with JDK v1.8.0_72 +1 javadoc 3m 47s the patch passed with JDK v1.7.0_95 +1 unit 0m 22s hadoop-yarn-api in the patch passed with JDK v1.8.0_72. +1 unit 2m 0s hadoop-yarn-common in the patch passed with JDK v1.8.0_72. +1 unit 9m 4s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.8.0_72. +1 unit 0m 24s hadoop-yarn-api in the patch passed with JDK v1.7.0_95. +1 unit 2m 16s hadoop-yarn-common in the patch passed with JDK v1.7.0_95. +1 unit 9m 29s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.7.0_95. +1 asflicense 0m 21s Patch does not generate ASF License warnings. 63m 54s Subsystem Report/Notes Docker Image:yetus/hadoop:0ca8df7 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12789242/yarn4697.004.patch JIRA Issue YARN-4697 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle xml uname Linux 608652843e9a 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 9ed17f1 Default Java 1.7.0_95 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_72 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95 findbugs v3.0.0 JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/10613/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn Console output https://builds.apache.org/job/PreCommit-YARN-Build/10613/console Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          wilfreds Wilfred Spiegelenburg added a comment -

          +1 LGTM

          Show
          wilfreds Wilfred Spiegelenburg added a comment - +1 LGTM
          Hide
          rkanter Robert Kanter added a comment -

          Thanks Haibo and everyone for reviews. Committed to trunk and branch-2!

          Show
          rkanter Robert Kanter added a comment - Thanks Haibo and everyone for reviews. Committed to trunk and branch-2!
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-trunk-Commit #9364 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9364/)
          YARN-4697. NM aggregation thread pool is not bound by limits (haibochen (rkanter: rev 954dd57043d2de4f962876c1b89753bfc7e4ce55)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/LogAggregationService.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #9364 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9364/ ) YARN-4697 . NM aggregation thread pool is not bound by limits (haibochen (rkanter: rev 954dd57043d2de4f962876c1b89753bfc7e4ce55) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/LogAggregationService.java
          Hide
          leftnoteasy Wangda Tan added a comment -

          Thanks Haibo Chen fixed this problem.

          I think this is a critical issue that could cause issues when NM restart with recovery enabled.

          I think we should backport this fix to 2.6+ releases.

          Thoughts?

          Show
          leftnoteasy Wangda Tan added a comment - Thanks Haibo Chen fixed this problem. I think this is a critical issue that could cause issues when NM restart with recovery enabled. I think we should backport this fix to 2.6+ releases. Thoughts?
          Hide
          vinodkv Vinod Kumar Vavilapalli added a comment -

          In the case that we have had a problem with log aggregation this could cause a problem on restart. The number of threads created at that point could be huge and will put a large load on the NameNode and in worse case could even bring it down due to file descriptor issues.

          Haibo Chen / Robert Kanter / Varun Vasudev / Wangda Tan, while I agree with a general notion that the thread pool limits are good, I actually fail to see how this problem is happening. The number of log-aggregation threads should be limited by the number of concurrent applications running and finishing in a cluster which should be in the order of thousands. Is there something special happening at restart time?

          My concern is that if don't fix the root-cause, though we've protected ourselves from crashes, we'd just be queueing a lot of aggregation processes and causing long waiting times.

          The other things we should do with this patch is that each thread should identify the current application being aggregated, so that we can debug issues better.

          Show
          vinodkv Vinod Kumar Vavilapalli added a comment - In the case that we have had a problem with log aggregation this could cause a problem on restart. The number of threads created at that point could be huge and will put a large load on the NameNode and in worse case could even bring it down due to file descriptor issues. Haibo Chen / Robert Kanter / Varun Vasudev / Wangda Tan , while I agree with a general notion that the thread pool limits are good, I actually fail to see how this problem is happening. The number of log-aggregation threads should be limited by the number of concurrent applications running and finishing in a cluster which should be in the order of thousands. Is there something special happening at restart time? My concern is that if don't fix the root-cause, though we've protected ourselves from crashes, we'd just be queueing a lot of aggregation processes and causing long waiting times. The other things we should do with this patch is that each thread should identify the current application being aggregated, so that we can debug issues better.
          Hide
          leftnoteasy Wangda Tan added a comment -

          IIUC, currently all application recovery go to INIT state, and AppInitTransition sends a event to LogAggregationService and a new thread is required.

          Not entirely sure about what's the best fix for this issue, can someone advice?

          Show
          leftnoteasy Wangda Tan added a comment - IIUC, currently all application recovery go to INIT state, and AppInitTransition sends a event to LogAggregationService and a new thread is required. Not entirely sure about what's the best fix for this issue, can someone advice?
          Hide
          haibochen Haibo Chen added a comment -

          Vinod Kumar Vavilapalli Upon NM restart, NM will try to recover all applications and submit a log aggregation task to the thread pool for each application recovered. Therefore, a large number of recovered applications plus concurrent applications can cause the thread pool to increase without a bound.

          Show
          haibochen Haibo Chen added a comment - Vinod Kumar Vavilapalli Upon NM restart, NM will try to recover all applications and submit a log aggregation task to the thread pool for each application recovered. Therefore, a large number of recovered applications plus concurrent applications can cause the thread pool to increase without a bound.
          Hide
          djp Junping Du added a comment -

          My concern is that if don't fix the root-cause, though we've protected ourselves from crashes, we'd just be queueing a lot of aggregation processes and causing long waiting times.

          Agree. We do see NM log aggregation service launch many active threads which keep large number of TCP connections to DN which use out system's file limit. We can fix shared limited thread number here, but the TCP connections problem may not solved by this patch.

          Upon NM restart, NM will try to recover all applications and submit a log aggregation task to the thread pool for each application recovered. Therefore, a large number of recovered applications plus concurrent applications can cause the thread pool to increase without a bound.

          Does all these applications are active one or finished already? I suspect we are leaking finished applications in NM state store in recover process. I noticed this issue in filing YARN-4325 but lost my progress as previous long running cluster is gone. Haibo Chen, could you check if your case is the same here?

          In general, I think the fix on this JIRA is OK. But I agree with Vinod that we should dig out more on the root cause or it could be other holes (like TCP connection leaking mentioned above).

          Show
          djp Junping Du added a comment - My concern is that if don't fix the root-cause, though we've protected ourselves from crashes, we'd just be queueing a lot of aggregation processes and causing long waiting times. Agree. We do see NM log aggregation service launch many active threads which keep large number of TCP connections to DN which use out system's file limit. We can fix shared limited thread number here, but the TCP connections problem may not solved by this patch. Upon NM restart, NM will try to recover all applications and submit a log aggregation task to the thread pool for each application recovered. Therefore, a large number of recovered applications plus concurrent applications can cause the thread pool to increase without a bound. Does all these applications are active one or finished already? I suspect we are leaking finished applications in NM state store in recover process. I noticed this issue in filing YARN-4325 but lost my progress as previous long running cluster is gone. Haibo Chen , could you check if your case is the same here? In general, I think the fix on this JIRA is OK. But I agree with Vinod that we should dig out more on the root cause or it could be other holes (like TCP connection leaking mentioned above).
          Hide
          djp Junping Du added a comment -

          After investigating on our cluster which hit the same issue also recently, I think two root cause here:
          1. Due to YARN-4325, too many stale applications doesn't get purged from NM state store clean. So NM recover too many stale application which will init app with log aggregation.
          2. Because these applications are stale, some operation - like: createAppDir() will failed with token issues, but we swallow exception there and continue to create invalid aggregator - just file YARN-4984 to fix this issue.

          Show
          djp Junping Du added a comment - After investigating on our cluster which hit the same issue also recently, I think two root cause here: 1. Due to YARN-4325 , too many stale applications doesn't get purged from NM state store clean. So NM recover too many stale application which will init app with log aggregation. 2. Because these applications are stale, some operation - like: createAppDir() will failed with token issues, but we swallow exception there and continue to create invalid aggregator - just file YARN-4984 to fix this issue.

            People

            • Assignee:
              haibochen Haibo Chen
              Reporter:
              haibochen Haibo Chen
            • Votes:
              0 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development