Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-4183

Clarify the behavior of timeline service config properties

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.7.1
    • Fix Version/s: 2.8.0, 2.7.3, 3.0.0-alpha1
    • Component/s: None
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      Configurations "yarn.timeline-service.enabled" and "yarn.timeline-service.client.best-effort" are not captured better. Currently if the client doesn't want the tokens to be generated for the timeline service they can set "yarn.timeline-service.enabled" to false and/or "yarn.timeline-service.client.best-effort" to true so that even if the ATS is down jobs can continue to get submitted. This functionality is not properly documented, so as part of this jira we try to document and clarify these configurations.

      1. YARN-4183.v1.002.patch
        5 kB
        Naganarasimha G R
      2. YARN-4183.v1.001.patch
        5 kB
        Naganarasimha G R
      3. YARN-4183.1.patch
        3 kB
        Mit Desai

        Issue Links

          Activity

          Hide
          vinodkv Vinod Kumar Vavilapalli added a comment -

          Closing the JIRA as part of 2.7.3 release.

          Show
          vinodkv Vinod Kumar Vavilapalli added a comment - Closing the JIRA as part of 2.7.3 release.
          Hide
          Naganarasimha Naganarasimha G R added a comment -

          Thanks for the reveiw and commit Sangjin Lee, and reviews from Jonathan Eagles,Mit Desai,Xuan Gong,Vinod Kumar Vavilapalli,Li Lu & Varun Saxena.
          I will cross check for 2.6 branch in a while.

          Show
          Naganarasimha Naganarasimha G R added a comment - Thanks for the reveiw and commit Sangjin Lee , and reviews from Jonathan Eagles , Mit Desai , Xuan Gong , Vinod Kumar Vavilapalli , Li Lu & Varun Saxena . I will cross check for 2.6 branch in a while.
          Hide
          sjlee0 Sangjin Lee added a comment -

          Committed the patch. Thanks Naganarasimha G R for your contribution! Thanks Jonathan Eagles, Mit Desai, Xuan Gong, Vinod Kumar Vavilapalli, Li Lu, and Varun Saxena for your comments and review.

          Naganarasimha G R, I committed it to 2.9.0, 2.8.0, and 2.7.3. Let me know if this needs to go into 2.6.x.

          Show
          sjlee0 Sangjin Lee added a comment - Committed the patch. Thanks Naganarasimha G R for your contribution! Thanks Jonathan Eagles , Mit Desai , Xuan Gong , Vinod Kumar Vavilapalli , Li Lu , and Varun Saxena for your comments and review. Naganarasimha G R , I committed it to 2.9.0, 2.8.0, and 2.7.3. Let me know if this needs to go into 2.6.x.
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-trunk-Commit #9535 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9535/)
          YARN-4183. Clarify the behavior of timeline service config properties (sjlee: rev 6d67420dbc5c6097216fa40fcec8ed626b2bae14)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/TimelineServer.md
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #9535 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9535/ ) YARN-4183 . Clarify the behavior of timeline service config properties (sjlee: rev 6d67420dbc5c6097216fa40fcec8ed626b2bae14) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/TimelineServer.md hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
          Hide
          Naganarasimha Naganarasimha G R added a comment -

          Sorry for the delay. Have updated it, please check the description if fine we can go ahead and get it committed.

          Show
          Naganarasimha Naganarasimha G R added a comment - Sorry for the delay. Have updated it, please check the description if fine we can go ahead and get it committed.
          Hide
          Naganarasimha Naganarasimha G R added a comment -

          Sure... will do it shortly

          Show
          Naganarasimha Naganarasimha G R added a comment - Sure... will do it shortly
          Hide
          sjlee0 Sangjin Lee added a comment -

          OK, could you do the honors?

          Show
          sjlee0 Sangjin Lee added a comment - OK, could you do the honors?
          Hide
          Naganarasimha Naganarasimha G R added a comment -

          Title of the jira is fine, but i think we should update the description too right ?

          Show
          Naganarasimha Naganarasimha G R added a comment - Title of the jira is fine, but i think we should update the description too right ?
          Hide
          sjlee0 Sangjin Lee added a comment -

          I updated the title of the JIRA. Let me know if it is good to go. I'll wait for a couple of hours before committing the patch. Thanks!

          Show
          sjlee0 Sangjin Lee added a comment - I updated the title of the JIRA. Let me know if it is good to go. I'll wait for a couple of hours before committing the patch. Thanks!
          Hide
          jeagles Jonathan Eagles added a comment -

          Please update the summary to indicate that this is a documentation only change and then it is ready to go in.

          Show
          jeagles Jonathan Eagles added a comment - Please update the summary to indicate that this is a documentation only change and then it is ready to go in.
          Hide
          Naganarasimha Naganarasimha G R added a comment -

          Hi Sangjin Lee & Jonathan Eagles,
          Shall we conclude on this ? or we may miss this eventually ..

          Show
          Naganarasimha Naganarasimha G R added a comment - Hi Sangjin Lee & Jonathan Eagles , Shall we conclude on this ? or we may miss this eventually ..
          Hide
          Naganarasimha Naganarasimha G R added a comment -

          Hi Sangjin Lee & Jonathan Eagles,
          Any further reviews / discussions required for this jira or we can go ahead and get it committed ?

          Show
          Naganarasimha Naganarasimha G R added a comment - Hi Sangjin Lee & Jonathan Eagles , Any further reviews / discussions required for this jira or we can go ahead and get it committed ?
          Hide
          Naganarasimha Naganarasimha G R added a comment -

          Hi Jonathan Eagles,
          Thanks for sharing your views, as i was mentioning in the comment, replacing TIMELINE_SERVICE_ENABLED with APPLICATION_HISTORY_ENABLED will not solve and in fact will introduce compatibility issues. So option was to remove the check for TIMELINE_SERVICE_ENABLED which according to Sangjin Lee was also not ideal and mentioned in his comment which after further discussion i too felt the same. Hence we concluded to update the document and did the necessary correction. If you are ok then we can go ahead or else if you have any other approaches we can discuss their pros & cons.

          Show
          Naganarasimha Naganarasimha G R added a comment - Hi Jonathan Eagles , Thanks for sharing your views, as i was mentioning in the comment , replacing TIMELINE_SERVICE_ENABLED with APPLICATION_HISTORY_ENABLED will not solve and in fact will introduce compatibility issues. So option was to remove the check for TIMELINE_SERVICE_ENABLED which according to Sangjin Lee was also not ideal and mentioned in his comment which after further discussion i too felt the same. Hence we concluded to update the document and did the necessary correction. If you are ok then we can go ahead or else if you have any other approaches we can discuss their pros & cons.
          Hide
          jeagles Jonathan Eagles added a comment -

          Took a look at the patch. The end result is that the issue described by this JIRA not addressed, but only better documented. I will try this suggestion out to see if i works in practice since I don't see any that anyone has tried this out yet and proven it to work. This will certainly put a burden on users that care about this use case working correctly. If I am the only that needs this feature at this time, this is probably acceptable.

          Show
          jeagles Jonathan Eagles added a comment - Took a look at the patch. The end result is that the issue described by this JIRA not addressed, but only better documented. I will try this suggestion out to see if i works in practice since I don't see any that anyone has tried this out yet and proven it to work. This will certainly put a burden on users that care about this use case working correctly. If I am the only that needs this feature at this time, this is probably acceptable.
          Hide
          Naganarasimha Naganarasimha G R added a comment -

          Hi Jonathan Eagles & Mit Desai,
          Any thoughts on the conclusion ?

          Show
          Naganarasimha Naganarasimha G R added a comment - Hi Jonathan Eagles & Mit Desai , Any thoughts on the conclusion ?
          Hide
          jeagles Jonathan Eagles added a comment -

          I'll try to look at this patch this week.

          Show
          jeagles Jonathan Eagles added a comment - I'll try to look at this patch this week.
          Hide
          Naganarasimha Naganarasimha G R added a comment -

          Yes Sangjin Lee, It would be good to wait for their feedback couple of days ...

          Show
          Naganarasimha Naganarasimha G R added a comment - Yes Sangjin Lee , It would be good to wait for their feedback couple of days ...
          Hide
          sjlee0 Sangjin Lee added a comment -

          I am +1 with the latest patch, but I'd wait until Mit and/or Jon chime in.

          Mit Desai, Jonathan Eagles, what are your thoughts? Is the conclusion here an acceptable conclusion for you guys?

          Show
          sjlee0 Sangjin Lee added a comment - I am +1 with the latest patch, but I'd wait until Mit and/or Jon chime in. Mit Desai , Jonathan Eagles , what are your thoughts? Is the conclusion here an acceptable conclusion for you guys?
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 0s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
          0 mvndep 0m 16s Maven dependency ordering for branch
          +1 mvninstall 7m 16s trunk passed
          +1 compile 2m 9s trunk passed with JDK v1.8.0_66
          +1 compile 2m 19s trunk passed with JDK v1.7.0_91
          +1 mvnsite 0m 52s trunk passed
          +1 mvneclipse 0m 24s trunk passed
          +1 javadoc 0m 41s trunk passed with JDK v1.8.0_66
          +1 javadoc 0m 45s trunk passed with JDK v1.7.0_91
          0 mvndep 0m 16s Maven dependency ordering for patch
          +1 mvninstall 0m 36s the patch passed
          +1 compile 2m 2s the patch passed with JDK v1.8.0_66
          +1 javac 2m 2s the patch passed
          +1 compile 2m 18s the patch passed with JDK v1.7.0_91
          +1 javac 2m 18s the patch passed
          +1 mvnsite 0m 47s the patch passed
          +1 mvneclipse 0m 20s the patch passed
          +1 whitespace 0m 0s Patch has no whitespace issues.
          +1 xml 0m 1s The patch has no ill-formed XML file.
          +1 javadoc 0m 36s the patch passed with JDK v1.8.0_66
          +1 javadoc 0m 41s the patch passed with JDK v1.7.0_91
          +1 unit 2m 1s hadoop-yarn-common in the patch passed with JDK v1.8.0_66.
          +1 unit 0m 7s hadoop-yarn-site in the patch passed with JDK v1.8.0_66.
          +1 unit 2m 17s hadoop-yarn-common in the patch passed with JDK v1.7.0_91.
          +1 unit 0m 7s hadoop-yarn-site in the patch passed with JDK v1.7.0_91.
          +1 asflicense 0m 19s Patch does not generate ASF License warnings.
          28m 12s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:0ca8df7
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12785942/YARN-4183.v1.002.patch
          JIRA Issue YARN-4183
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit xml
          uname Linux dcc67f09873d 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / d6b1acb
          Default Java 1.7.0_91
          Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_66 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_91
          JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/10477/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site U: hadoop-yarn-project/hadoop-yarn
          Max memory used 77MB
          Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/10477/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. 0 mvndep 0m 16s Maven dependency ordering for branch +1 mvninstall 7m 16s trunk passed +1 compile 2m 9s trunk passed with JDK v1.8.0_66 +1 compile 2m 19s trunk passed with JDK v1.7.0_91 +1 mvnsite 0m 52s trunk passed +1 mvneclipse 0m 24s trunk passed +1 javadoc 0m 41s trunk passed with JDK v1.8.0_66 +1 javadoc 0m 45s trunk passed with JDK v1.7.0_91 0 mvndep 0m 16s Maven dependency ordering for patch +1 mvninstall 0m 36s the patch passed +1 compile 2m 2s the patch passed with JDK v1.8.0_66 +1 javac 2m 2s the patch passed +1 compile 2m 18s the patch passed with JDK v1.7.0_91 +1 javac 2m 18s the patch passed +1 mvnsite 0m 47s the patch passed +1 mvneclipse 0m 20s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 xml 0m 1s The patch has no ill-formed XML file. +1 javadoc 0m 36s the patch passed with JDK v1.8.0_66 +1 javadoc 0m 41s the patch passed with JDK v1.7.0_91 +1 unit 2m 1s hadoop-yarn-common in the patch passed with JDK v1.8.0_66. +1 unit 0m 7s hadoop-yarn-site in the patch passed with JDK v1.8.0_66. +1 unit 2m 17s hadoop-yarn-common in the patch passed with JDK v1.7.0_91. +1 unit 0m 7s hadoop-yarn-site in the patch passed with JDK v1.7.0_91. +1 asflicense 0m 19s Patch does not generate ASF License warnings. 28m 12s Subsystem Report/Notes Docker Image:yetus/hadoop:0ca8df7 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12785942/YARN-4183.v1.002.patch JIRA Issue YARN-4183 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit xml uname Linux dcc67f09873d 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / d6b1acb Default Java 1.7.0_91 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_66 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_91 JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/10477/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site U: hadoop-yarn-project/hadoop-yarn Max memory used 77MB Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org Console output https://builds.apache.org/job/PreCommit-YARN-Build/10477/console This message was automatically generated.
          Hide
          Naganarasimha Naganarasimha G R added a comment -

          Thanks for the comments Sangjin Lee,
          I have corrected the typos which you mentioned and have uploaded the same. Please review

          Show
          Naganarasimha Naganarasimha G R added a comment - Thanks for the comments Sangjin Lee , I have corrected the typos which you mentioned and have uploaded the same. Please review
          Hide
          sjlee0 Sangjin Lee added a comment -

          My apologies Naganarasimha G R for taking a long time to reply.

          It looks good to me for the most part. There are only a few typos to correct:

          • l.1797: "client want" -> "client wants"
          • l.1798: "if its enabled" -> "if it's enabled"

          and their counterparts in TimelineServer.md.

          Mit Desai, Jonathan Eagles, are you guys comfortable with the conclusion on this JIRA?

          Show
          sjlee0 Sangjin Lee added a comment - My apologies Naganarasimha G R for taking a long time to reply. It looks good to me for the most part. There are only a few typos to correct: l.1797: "client want" -> "client wants" l.1798: "if its enabled" -> "if it's enabled" and their counterparts in TimelineServer.md. Mit Desai , Jonathan Eagles , are you guys comfortable with the conclusion on this JIRA?
          Hide
          Naganarasimha Naganarasimha G R added a comment -

          Hi Sangjin Lee, If you have cycles can you take a look at the last patch ?

          Show
          Naganarasimha Naganarasimha G R added a comment - Hi Sangjin Lee , If you have cycles can you take a look at the last patch ?
          Hide
          Naganarasimha Naganarasimha G R added a comment -

          Hi Sangjin Lee,
          Any updates/reviews for this patch ?

          Show
          Naganarasimha Naganarasimha G R added a comment - Hi Sangjin Lee , Any updates/reviews for this patch ?
          Hide
          Naganarasimha Naganarasimha G R added a comment -

          Sangjin Lee, can you take a look at this jira ?

          Show
          Naganarasimha Naganarasimha G R added a comment - Sangjin Lee , can you take a look at this jira ?
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 0s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
          +1 mvninstall 7m 28s trunk passed
          +1 compile 1m 44s trunk passed with JDK v1.8.0_66
          +1 compile 2m 6s trunk passed with JDK v1.7.0_91
          +1 mvnsite 0m 49s trunk passed
          +1 mvneclipse 0m 23s trunk passed
          +1 javadoc 0m 36s trunk passed with JDK v1.8.0_66
          +1 javadoc 0m 44s trunk passed with JDK v1.7.0_91
          +1 mvninstall 0m 34s the patch passed
          +1 compile 1m 42s the patch passed with JDK v1.8.0_66
          +1 javac 1m 42s the patch passed
          +1 compile 2m 3s the patch passed with JDK v1.7.0_91
          +1 javac 2m 3s the patch passed
          +1 mvnsite 0m 42s the patch passed
          +1 mvneclipse 0m 19s the patch passed
          +1 whitespace 0m 0s Patch has no whitespace issues.
          +1 xml 0m 1s The patch has no ill-formed XML file.
          +1 javadoc 0m 32s the patch passed with JDK v1.8.0_66
          +1 javadoc 0m 39s the patch passed with JDK v1.7.0_91
          +1 unit 1m 51s hadoop-yarn-common in the patch passed with JDK v1.8.0_66.
          +1 unit 0m 6s hadoop-yarn-site in the patch passed with JDK v1.8.0_66.
          +1 unit 2m 7s hadoop-yarn-common in the patch passed with JDK v1.7.0_91.
          +1 unit 0m 8s hadoop-yarn-site in the patch passed with JDK v1.7.0_91.
          +1 asflicense 0m 19s Patch does not generate ASF License warnings.
          25m 57s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:0ca8df7
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12779596/YARN-4183.v1.001.patch
          JIRA Issue YARN-4183
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit xml
          uname Linux 56b10cd8b7e5 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / fb00794
          Default Java 1.7.0_91
          Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_66 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_91
          JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/10100/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site U: hadoop-yarn-project/hadoop-yarn
          Max memory used 75MB
          Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/10100/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 mvninstall 7m 28s trunk passed +1 compile 1m 44s trunk passed with JDK v1.8.0_66 +1 compile 2m 6s trunk passed with JDK v1.7.0_91 +1 mvnsite 0m 49s trunk passed +1 mvneclipse 0m 23s trunk passed +1 javadoc 0m 36s trunk passed with JDK v1.8.0_66 +1 javadoc 0m 44s trunk passed with JDK v1.7.0_91 +1 mvninstall 0m 34s the patch passed +1 compile 1m 42s the patch passed with JDK v1.8.0_66 +1 javac 1m 42s the patch passed +1 compile 2m 3s the patch passed with JDK v1.7.0_91 +1 javac 2m 3s the patch passed +1 mvnsite 0m 42s the patch passed +1 mvneclipse 0m 19s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 xml 0m 1s The patch has no ill-formed XML file. +1 javadoc 0m 32s the patch passed with JDK v1.8.0_66 +1 javadoc 0m 39s the patch passed with JDK v1.7.0_91 +1 unit 1m 51s hadoop-yarn-common in the patch passed with JDK v1.8.0_66. +1 unit 0m 6s hadoop-yarn-site in the patch passed with JDK v1.8.0_66. +1 unit 2m 7s hadoop-yarn-common in the patch passed with JDK v1.7.0_91. +1 unit 0m 8s hadoop-yarn-site in the patch passed with JDK v1.7.0_91. +1 asflicense 0m 19s Patch does not generate ASF License warnings. 25m 57s Subsystem Report/Notes Docker Image:yetus/hadoop:0ca8df7 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12779596/YARN-4183.v1.001.patch JIRA Issue YARN-4183 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit xml uname Linux 56b10cd8b7e5 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / fb00794 Default Java 1.7.0_91 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_66 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_91 JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/10100/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site U: hadoop-yarn-project/hadoop-yarn Max memory used 75MB Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org Console output https://builds.apache.org/job/PreCommit-YARN-Build/10100/console This message was automatically generated.
          Hide
          Naganarasimha Naganarasimha G R added a comment -

          can update the title if the fix is fine

          Show
          Naganarasimha Naganarasimha G R added a comment - can update the title if the fix is fine
          Hide
          Naganarasimha Naganarasimha G R added a comment -

          Sangjin Lee, attaching a patch as per previous description, please have a look

          Show
          Naganarasimha Naganarasimha G R added a comment - Sangjin Lee , attaching a patch as per previous description, please have a look
          Hide
          Naganarasimha Naganarasimha G R added a comment -

          Sangjin Lee, any thoughts on my previous comment ?

          Show
          Naganarasimha Naganarasimha G R added a comment - Sangjin Lee , any thoughts on my previous comment ?
          Hide
          Naganarasimha Naganarasimha G R added a comment -

          Sangjin Lee, Sorry missed this comment earlier,
          My thoughts are also in line with yours, but just that the documentation is not capturing this, Existing :

          Indicate to clients whether timeline service is enabled or not. If enabled, clients will put entities and events to the timeline server

          i feel we need to capture it as

          In the server side it indicates whether timeline service is enabled or not. And in the client side, users can enable it to indicate whether they want to use timeline service. If enabled in the clientside and security is also enabled, then yarn client tries to fetch the delegation tokens for the timeline server.

          modifications are welcome
          also there yarn.timeline-service.client.best-effort is wrongly documented as yarn.timeline-service.best-effort.
          So shall i get these things corrected as part of this jira ?

          Show
          Naganarasimha Naganarasimha G R added a comment - Sangjin Lee , Sorry missed this comment earlier, My thoughts are also in line with yours, but just that the documentation is not capturing this, Existing : Indicate to clients whether timeline service is enabled or not. If enabled, clients will put entities and events to the timeline server i feel we need to capture it as In the server side it indicates whether timeline service is enabled or not. And in the client side, users can enable it to indicate whether they want to use timeline service. If enabled in the clientside and security is also enabled, then yarn client tries to fetch the delegation tokens for the timeline server. modifications are welcome also there yarn.timeline-service.client.best-effort is wrongly documented as yarn.timeline-service.best-effort . So shall i get these things corrected as part of this jira ?
          Hide
          sjlee0 Sangjin Lee added a comment -

          IMO it is probably acceptable for clients to set yarn.timeline-service.enabled (and yarn.timeline-service.client.best-effort?) to false. If we add a new configuration, I think it should be more along the line of "participate in the timeline service functionality" rather than specifically about the delegation token. My understanding is that the YARN client just does not want to do anything with the timeline service in this case, right?

          Show
          sjlee0 Sangjin Lee added a comment - IMO it is probably acceptable for clients to set yarn.timeline-service.enabled (and yarn.timeline-service.client.best-effort?) to false. If we add a new configuration, I think it should be more along the line of "participate in the timeline service functionality" rather than specifically about the delegation token. My understanding is that the YARN client just does not want to do anything with the timeline service in this case, right?
          Hide
          Naganarasimha Naganarasimha G R added a comment -

          Hi Sangjin Lee, Junping Du & Xuan Gong, Now that YARN-3623 is in, can we decide on this ? Whether we need to introduce another configuration to decide whether client delegation tokens are required to be fetched along with the existing configuration (timeline service and security is enabled ) ? or is it sufficient that clients can configure yarn.timeline-service.client.best-effort / yarn.timeline-service.enabled to false

          Show
          Naganarasimha Naganarasimha G R added a comment - Hi Sangjin Lee , Junping Du & Xuan Gong , Now that YARN-3623 is in, can we decide on this ? Whether we need to introduce another configuration to decide whether client delegation tokens are required to be fetched along with the existing configuration (timeline service and security is enabled ) ? or is it sufficient that clients can configure yarn.timeline-service.client.best-effort / yarn.timeline-service.enabled to false
          Hide
          Naganarasimha Naganarasimha G R added a comment -

          Thanks for the proposal Sangjin Lee,
          Only additional query is should the timeline server run if the yarn.timeline-service.enabled is enabled ? My opinion would be to have it so that the configurations is much more stronger, thoughts ?

          there needs to be a strong client-side config for each framework (MR or tez) that controls whether it wants to use the timeline service;

          How about the config be yarn.timeline-service.client.require-delegation-token or yarn.timeline-service.client.delegation-token.enabled ?

          I hope the last point should address the original issue of this JIRA

          And to further clarify and also as mentioned earlier, there is one more config yarn.timeline-service.client.best-effort which will avoid clients to fail when delegation token fails to be retreived.

          Show
          Naganarasimha Naganarasimha G R added a comment - Thanks for the proposal Sangjin Lee , Only additional query is should the timeline server run if the yarn.timeline-service.enabled is enabled ? My opinion would be to have it so that the configurations is much more stronger, thoughts ? there needs to be a strong client-side config for each framework (MR or tez) that controls whether it wants to use the timeline service; How about the config be yarn.timeline-service.client.require-delegation-token or yarn.timeline-service.client.delegation-token.enabled ? I hope the last point should address the original issue of this JIRA And to further clarify and also as mentioned earlier, there is one more config yarn.timeline-service.client.best-effort which will avoid clients to fail when delegation token fails to be retreived.
          Hide
          vinodkv Vinod Kumar Vavilapalli added a comment -

          In the interest of focus, we should move all the version related discussion to YARN-3623. I'll make this JIRA depend on YARN-3623.

          Show
          vinodkv Vinod Kumar Vavilapalli added a comment - In the interest of focus, we should move all the version related discussion to YARN-3623 . I'll make this JIRA depend on YARN-3623 .
          Hide
          sjlee0 Sangjin Lee added a comment -

          Thanks Vinod Kumar Vavilapalli for the suggestion. The following is my proposal.

          • yarn.timeline-service.enabled should be interpreted as the config that indicates the timeline service daemon is up both on the client side and the server side
          • specifically for the RM's system metrics publisher, it should continue to check both yarn.timeline-service.enabled and yarn.resourcemanager.system-metrics-publisher.enabled (which is the current code); if not, and the timeline service is not up (which is a totally valid situation), the system metrics publisher will have a continuous stream of write errors
          • there needs to be a strong client-side config for each framework (MR or tez) that controls whether it wants to use the timeline service; the client can use the timeline service only if both yarn.timeline-service.enabled is true and its own config is true

          I hope the last point should address the original issue of this JIRA (if MR does not want to use the timeline service, it should be able to do so).

          The decision to use the timeline service and get the delegation token should not hinge on which version is enabled IMO, as the version is another global property.

          Let me know if that sounds reasonable.

          Show
          sjlee0 Sangjin Lee added a comment - Thanks Vinod Kumar Vavilapalli for the suggestion. The following is my proposal. yarn.timeline-service.enabled should be interpreted as the config that indicates the timeline service daemon is up both on the client side and the server side specifically for the RM's system metrics publisher, it should continue to check both yarn.timeline-service.enabled and yarn.resourcemanager.system-metrics-publisher.enabled (which is the current code); if not, and the timeline service is not up (which is a totally valid situation), the system metrics publisher will have a continuous stream of write errors there needs to be a strong client-side config for each framework (MR or tez) that controls whether it wants to use the timeline service; the client can use the timeline service only if both yarn.timeline-service.enabled is true and its own config is true I hope the last point should address the original issue of this JIRA (if MR does not want to use the timeline service, it should be able to do so). The decision to use the timeline service and get the delegation token should not hinge on which version is enabled IMO, as the version is another global property. Let me know if that sounds reasonable.
          Hide
          vinodkv Vinod Kumar Vavilapalli added a comment -

          It's a wall of text, but to facilitate progress, Sangjin Lee / Naganarasimha G R, can one of you summarize your (proposal + open-questions) similar to what I did in my comment above under The right thing to do ? Tx.

          Show
          vinodkv Vinod Kumar Vavilapalli added a comment - It's a wall of text, but to facilitate progress, Sangjin Lee / Naganarasimha G R , can one of you summarize your (proposal + open-questions) similar to what I did in my comment above under The right thing to do ? Tx.
          Hide
          sjlee0 Sangjin Lee added a comment -

          But along with this i think we need to start the timelineservice daemon only if yarn.timeline-service.enabled is set to true. This will double ensure that the configurations are done properly. But might have compatability issues. Thoughts?

          I think that would be a good sanity check. I'm honestly not 100% clear on what interpretation is considered compatible and what not when it comes to configuration. Understanding yarn.timeline-service.enabled implies the server-side timeline service daemons should be up and using it that way seems within the limits of the current understanding. Those who have better understanding on the compatibility, could you comment? What scenarios could be worrisome?

          I was wondering what should be the default value so that it doesnt break the compatability with existing apps, i presume it should be true to get the same behavior as of now ?

          Yes if changing the default is considered incompatible. Then again, I'm not 100% certain.

          Show
          sjlee0 Sangjin Lee added a comment - But along with this i think we need to start the timelineservice daemon only if yarn.timeline-service.enabled is set to true. This will double ensure that the configurations are done properly. But might have compatability issues. Thoughts? I think that would be a good sanity check. I'm honestly not 100% clear on what interpretation is considered compatible and what not when it comes to configuration. Understanding yarn.timeline-service.enabled implies the server-side timeline service daemons should be up and using it that way seems within the limits of the current understanding. Those who have better understanding on the compatibility, could you comment? What scenarios could be worrisome? I was wondering what should be the default value so that it doesnt break the compatability with existing apps, i presume it should be true to get the same behavior as of now ? Yes if changing the default is considered incompatible. Then again, I'm not 100% certain.
          Hide
          Naganarasimha Naganarasimha G R added a comment -

          Mit Desai, If the approach is finalized would you like to handle it or shall i ?

          Show
          Naganarasimha Naganarasimha G R added a comment - Mit Desai , If the approach is finalized would you like to handle it or shall i ?
          Hide
          Naganarasimha Naganarasimha G R added a comment -

          Thanks for sharing your views Sangjin Lee,

          I think we should assume a reasonable use case here where it is fair to expect the timeline service to be there if yarn.timeline-service.enabled is true. If that config is true but the timeline service is not up, then I think it is acceptable to see continuous timeline service failures (this is the existing behavior btw).

          Ok even though i am not convinced its 100% necessary but yes i do see benifits as it will avoid lot logs and delays because of timeline client trying multiple times for each event being logged.
          But along with this i think we need to start the timelineservice daemon only if yarn.timeline-service.enabled is set to true. This will double ensure that the configurations are done properly. But might have compatability issues. Thoughts?

          I would advocate having a separate client-side config. Whether the server has enabled the timeline service and whether a particular client/app will use it are separate concerns, and separate configs should drive them.

          I was wondering what should be the default value so that it doesnt break the compatability with existing apps, i presume it should be true to get the same behavior as of now ?

          Show
          Naganarasimha Naganarasimha G R added a comment - Thanks for sharing your views Sangjin Lee , I think we should assume a reasonable use case here where it is fair to expect the timeline service to be there if yarn.timeline-service.enabled is true. If that config is true but the timeline service is not up, then I think it is acceptable to see continuous timeline service failures (this is the existing behavior btw). Ok even though i am not convinced its 100% necessary but yes i do see benifits as it will avoid lot logs and delays because of timeline client trying multiple times for each event being logged. But along with this i think we need to start the timelineservice daemon only if yarn.timeline-service.enabled is set to true. This will double ensure that the configurations are done properly. But might have compatability issues. Thoughts? I would advocate having a separate client-side config. Whether the server has enabled the timeline service and whether a particular client/app will use it are separate concerns, and separate configs should drive them. I was wondering what should be the default value so that it doesnt break the compatability with existing apps, i presume it should be true to get the same behavior as of now ?
          Hide
          sjlee0 Sangjin Lee added a comment -

          May be we can try to fail the RM startup if SMP is not able to connect to Timelineserver,but IIUC Jonathan Eagles and Jason Lowe in other ATS 1.5 jira were informing that cluster should run though the timelineserver is not running hence its not desirable to fail the RM startup.

          I'm not suggesting that we fail the RM if the SMP cannot connect to the timeline server. I'm suggesting we can disable the SMP (as opposed to having continuous failures or failing the RM) if we know the timeline service is disabled. That's why the config helps, as it is the clearly indicated intent for running the cluster.

          As you said, it is possible that yarn.timeline-service.enabled is set to false even if the timeline service is up and vice versa. But I think we should assume a reasonable use case here where it is fair to expect the timeline service to be there if yarn.timeline-service.enabled is true. If that config is true but the timeline service is not up, then I think it is acceptable to see continuous timeline service failures (this is the existing behavior btw).

          if we still want to go ahead with yarn.timeline-service.enabled then we might need to come up with a new configuration to indicate that client wants to use the timeline server hence create the timeline client and the timelineserver delegation tokens.

          Yes, I agree. It's implied that we would need a different config/mechanism for disabling getting the delegation token. I would advocate having a separate client-side config. Whether the server has enabled the timeline service and whether a particular client/app will use it are separate concerns, and separate configs should drive them. The current state of tez and MR is a good example. MR should be able to say I won't use the timeline service even if it is available. Any MR code that uses the timeline service should basically check both configs; i.e. the timeline service should be enabled and its config to use the timeline service should be true.

          This might be an idealistic argument on my part, but I think doing something along that line would lead to cleaner separation of concerns and larger degrees of freedom. My 2 cents.

          Show
          sjlee0 Sangjin Lee added a comment - May be we can try to fail the RM startup if SMP is not able to connect to Timelineserver,but IIUC Jonathan Eagles and Jason Lowe in other ATS 1.5 jira were informing that cluster should run though the timelineserver is not running hence its not desirable to fail the RM startup. I'm not suggesting that we fail the RM if the SMP cannot connect to the timeline server. I'm suggesting we can disable the SMP (as opposed to having continuous failures or failing the RM) if we know the timeline service is disabled. That's why the config helps, as it is the clearly indicated intent for running the cluster. As you said, it is possible that yarn.timeline-service.enabled is set to false even if the timeline service is up and vice versa. But I think we should assume a reasonable use case here where it is fair to expect the timeline service to be there if yarn.timeline-service.enabled is true. If that config is true but the timeline service is not up, then I think it is acceptable to see continuous timeline service failures (this is the existing behavior btw). if we still want to go ahead with yarn.timeline-service.enabled then we might need to come up with a new configuration to indicate that client wants to use the timeline server hence create the timeline client and the timelineserver delegation tokens. Yes, I agree. It's implied that we would need a different config/mechanism for disabling getting the delegation token. I would advocate having a separate client-side config. Whether the server has enabled the timeline service and whether a particular client/app will use it are separate concerns, and separate configs should drive them. The current state of tez and MR is a good example. MR should be able to say I won't use the timeline service even if it is available. Any MR code that uses the timeline service should basically check both configs; i.e. the timeline service should be enabled and its config to use the timeline service should be true. This might be an idealistic argument on my part, but I think doing something along that line would lead to cleaner separation of concerns and larger degrees of freedom. My 2 cents.
          Hide
          Naganarasimha Naganarasimha G R added a comment -

          Hi Sangjin Lee,
          Thanks for sharing your thoughts

          If the answer is "yarn.resourcemanager.system-metrics-publisher.enabled should not be set to true if the timeline service is disabled", then it only makes it clear that yarn.resourcemanager.system-metrics-publisher.enabled=true implies yarn.timeline-service.enabled=true. Then we should check it explicitly. Thoughts?

          yarn.timeline-service.enabled doesn't imply that timeline service is running as we can enable it and still if the timeline server is not running then we face the same problem. We can configure yarn.timeline-service.enabled to be false and still start the AHS / timelineserver, IIUC Its not used any where in the AHS / timelineserver while starting up. Hence i interpreted it to be a client side config which tries to indicate the yarnclient that i am trying to use the timelineserver. My initial thoughts about this configuration was similar to your approach but it will have flaws because just by another configuration we cannot guarantee that timeline service is running in the cluster.
          May be we can try to fail the RM startup if SMP is not able to connect to Timelineserver,but IIUC Jonathan Eagles and Jason Lowe in other ATS 1.5 jira were informing that cluster should run though the timelineserver is not running hence its not desirable to fail the RM startup. Max we can do is to update the document that "yarn.resourcemanager.system-metrics-publisher.enabled" requires Timelineservice to be running

          but the way I view it is that it should act as a "master switch" for the timeline service; i.e. the highest level switch that toggles the feature on and off on all sides

          This implies that when start the timeline service daemon then we need to check for yarn.timeline-service.enabled and if false we need to get it down ? but in none of the other daemons we have it in that way, so will that be ok ? Also if the configurations used for the timelineserver has it but not for the other daemons then again we face issue.

          Also, consider the fact that the system metrics publisher may not be the only server-side component that interacts with the timeline service. There may be others and there will be more with the timeline service v.2 (e.g. NM collector service, etc.).

          I agree to fact that we will be having lot of additions in the future versions, but we will be having further configurations like ATS version and on top of this having one more configuration like yarn.timeline-service.enabled will it be of use ?.

          These are my views but if we still want to go ahead with yarn.timeline-service.enabled then we might need to come up with a new configuration to indicate that client wants to use the timeline server hence create the timeline client and the timelineserver delegation tokens. If as used in the current approach then we will meet the issues as mentioned by Jonathan Eagles. i.e. server configurations are copied to all clients and if timeline server is enabled then delegation tokens is created. So each client would require to explicitly reset the yarn.timeline-service.enabled configuration to false if they don't want to use it.

          Show
          Naganarasimha Naganarasimha G R added a comment - Hi Sangjin Lee , Thanks for sharing your thoughts If the answer is "yarn.resourcemanager.system-metrics-publisher.enabled should not be set to true if the timeline service is disabled", then it only makes it clear that yarn.resourcemanager.system-metrics-publisher.enabled=true implies yarn.timeline-service.enabled=true. Then we should check it explicitly. Thoughts? yarn.timeline-service.enabled doesn't imply that timeline service is running as we can enable it and still if the timeline server is not running then we face the same problem. We can configure yarn.timeline-service.enabled to be false and still start the AHS / timelineserver, IIUC Its not used any where in the AHS / timelineserver while starting up. Hence i interpreted it to be a client side config which tries to indicate the yarnclient that i am trying to use the timelineserver. My initial thoughts about this configuration was similar to your approach but it will have flaws because just by another configuration we cannot guarantee that timeline service is running in the cluster. May be we can try to fail the RM startup if SMP is not able to connect to Timelineserver,but IIUC Jonathan Eagles and Jason Lowe in other ATS 1.5 jira were informing that cluster should run though the timelineserver is not running hence its not desirable to fail the RM startup. Max we can do is to update the document that "yarn.resourcemanager.system-metrics-publisher.enabled" requires Timelineservice to be running but the way I view it is that it should act as a "master switch" for the timeline service; i.e. the highest level switch that toggles the feature on and off on all sides This implies that when start the timeline service daemon then we need to check for yarn.timeline-service.enabled and if false we need to get it down ? but in none of the other daemons we have it in that way, so will that be ok ? Also if the configurations used for the timelineserver has it but not for the other daemons then again we face issue. Also, consider the fact that the system metrics publisher may not be the only server-side component that interacts with the timeline service. There may be others and there will be more with the timeline service v.2 (e.g. NM collector service, etc.). I agree to fact that we will be having lot of additions in the future versions, but we will be having further configurations like ATS version and on top of this having one more configuration like yarn.timeline-service.enabled will it be of use ?. These are my views but if we still want to go ahead with yarn.timeline-service.enabled then we might need to come up with a new configuration to indicate that client wants to use the timeline server hence create the timeline client and the timelineserver delegation tokens. If as used in the current approach then we will meet the issues as mentioned by Jonathan Eagles . i.e. server configurations are copied to all clients and if timeline server is enabled then delegation tokens is created. So each client would require to explicitly reset the yarn.timeline-service.enabled configuration to false if they don't want to use it.
          Hide
          sjlee0 Sangjin Lee added a comment -

          I agree we probably shouldn't put too many points of discussion here that may not be core to this JIRA at hand. I'd like to focus on the SystemMetricsPublisher and yarn.resourcemanager.system-metrics-publisher.enabled and yarn.timeline-service.enabled.

          as far as 2.7.2 is concerned i feel yarn.resourcemanager.system-metrics-publisher.enabled is sufficient to be configured.

          I'm not sure if that is desirable. Here is a key question. Suppose the timeline service is disabled, and no timeline daemons are running. And suppose yarn.resourcemanager.system-metrics-publisher.enabled is true, and we changed SystemMetricsPublisher to check only that flag. What would happen? AFAICT, the SystemMetricsPublisher will fire up the timeline client, and will try to send all the events actively to the timeline server. But since the timeline server is down, it will lead to continuous failures of writing to the timeline server, right? IMO, this type of very late failures is deeply unsatisfying and problematic.

          If the answer is "yarn.resourcemanager.system-metrics-publisher.enabled should not be set to true if the timeline service is disabled", then it only makes it clear that yarn.resourcemanager.system-metrics-publisher.enabled=true implies yarn.timeline-service.enabled=true. Then we should check it explicitly. Thoughts?

          As far as i view it "yarn.timeline-service.enabled"* name is misleading, it should be more to signify client requires the timeline service's delegation token. Which will not be a server side config. Thoughts?

          I'm not sure if that's how it's currently interpreted, but the way I view it is that it should act as a "master switch" for the timeline service; i.e. the highest level switch that toggles the feature on and off on all sides. There can be "sub-switches" that can control finer-grained parts of the feature (e.g. the system metrics publisher). But those subfeatures should always check the master switch before checking their own. This will lead to a clean and consistent pattern of using the feature everywhere.

          Also, consider the fact that the system metrics publisher may not be the only server-side component that interacts with the timeline service. There may be others and there will be more with the timeline service v.2 (e.g. NM collector service, etc.). If they all handle the failure case of the timeline server not being up in their own way, it would be quite confusing and error-prone. It would be consistent and easy to handle if everyone checks the master switch (and possibly their own subfeature switch), and wires off the feature as early as possible. So I would argue that yarn.timeline-service.enabled should be interpreted as such a "master switch", both for server-side and client-side.

          I'd like to hear your thoughts. Thanks!

          Show
          sjlee0 Sangjin Lee added a comment - I agree we probably shouldn't put too many points of discussion here that may not be core to this JIRA at hand. I'd like to focus on the SystemMetricsPublisher and yarn.resourcemanager.system-metrics-publisher.enabled and yarn.timeline-service.enabled. as far as 2.7.2 is concerned i feel yarn.resourcemanager.system-metrics-publisher.enabled is sufficient to be configured. I'm not sure if that is desirable. Here is a key question. Suppose the timeline service is disabled, and no timeline daemons are running. And suppose yarn.resourcemanager.system-metrics-publisher.enabled is true , and we changed SystemMetricsPublisher to check only that flag. What would happen? AFAICT, the SystemMetricsPublisher will fire up the timeline client, and will try to send all the events actively to the timeline server. But since the timeline server is down, it will lead to continuous failures of writing to the timeline server, right? IMO, this type of very late failures is deeply unsatisfying and problematic. If the answer is "yarn.resourcemanager.system-metrics-publisher.enabled should not be set to true if the timeline service is disabled", then it only makes it clear that yarn.resourcemanager.system-metrics-publisher.enabled=true implies yarn.timeline-service.enabled=true. Then we should check it explicitly. Thoughts? As far as i view it "yarn.timeline-service.enabled"* name is misleading, it should be more to signify client requires the timeline service's delegation token. Which will not be a server side config. Thoughts? I'm not sure if that's how it's currently interpreted, but the way I view it is that it should act as a "master switch" for the timeline service; i.e. the highest level switch that toggles the feature on and off on all sides. There can be "sub-switches" that can control finer-grained parts of the feature (e.g. the system metrics publisher). But those subfeatures should always check the master switch before checking their own. This will lead to a clean and consistent pattern of using the feature everywhere. Also, consider the fact that the system metrics publisher may not be the only server-side component that interacts with the timeline service. There may be others and there will be more with the timeline service v.2 (e.g. NM collector service, etc.). If they all handle the failure case of the timeline server not being up in their own way, it would be quite confusing and error-prone. It would be consistent and easy to handle if everyone checks the master switch (and possibly their own subfeature switch), and wires off the feature as early as possible. So I would argue that yarn.timeline-service.enabled should be interpreted as such a "master switch", both for server-side and client-side. I'd like to hear your thoughts. Thanks!
          Hide
          varun_saxena Varun Saxena added a comment -

          i feel yarn.resourcemanager.system-metrics-publisher.enabled is sufficient to be configured.

          Agree. Enabling system metrics publisher should be considered to be enough to publish events from RM.

          As far as i view it "yarn.timeline-service.enabled"* name is misleading, it should be more to signify client requires the timeline service's delegation token.

          Maybe we can use the version config to decide if we have to fetch a token or not (in addition with timeline service enabled config ?).

          Show
          varun_saxena Varun Saxena added a comment - i feel yarn.resourcemanager.system-metrics-publisher.enabled is sufficient to be configured. Agree. Enabling system metrics publisher should be considered to be enough to publish events from RM. As far as i view it "yarn.timeline-service.enabled"* name is misleading, it should be more to signify client requires the timeline service's delegation token. Maybe we can use the version config to decide if we have to fetch a token or not (in addition with timeline service enabled config ?).
          Hide
          Naganarasimha Naganarasimha G R added a comment -

          As far as i view it "yarn.timeline-service.enabled" name is misleading, it should be more to signify client requires the timeline service's delegation token.

          Oops in retrospect, its required in the yarnclientImpl to create the timelineclient, so may be need to find some other meaningful name.

          Show
          Naganarasimha Naganarasimha G R added a comment - As far as i view it "yarn.timeline-service.enabled" name is misleading, it should be more to signify client requires the timeline service's delegation token. Oops in retrospect, its required in the yarnclientImpl to create the timelineclient , so may be need to find some other meaningful name.
          Hide
          Naganarasimha Naganarasimha G R added a comment -

          Yes, Sangjin Lee & Li Lu (inactive), There are too many points to discuss wrt versions. But i felt scope of this jira was more on whether to have "yarn.timeline-service.enabled" is required in RM's SystemMetricsPublisher. Which i feel we need to stick to, as viewers of this jira might get confused is scope is more . And raise a new jira which can be subtask of 1.5 or 2 and discuss the pro's cons of multiple ATS versions to be supported. If OK will raise one in 1.5

          I believe it strongly implies it applies to the server-side too. After all, even if it is indicated to the client that the timeline service is enabled, it would be no use obviously (and actually worse than not indicating) if the timeline service was not running.

          Well once we move to 1.5 and 2 we will anyway have some form of "yarn.timeline-service.version" which signify that Server needs to start timelineserver and RM's SMP will know whether to publish to timelineserver if the version is set.
          But as far as 2.7.2 is concerned i feel yarn.resourcemanager.system-metrics-publisher.enabled is sufficient to be configured. If not unnecessary additional configuration needs to be done by the admin.
          As far as i view it "yarn.timeline-service.enabled"* name is misleading, it should be more to signify client requires the timeline service's delegation token. Which will not be a server side config. Thoughts?

          Show
          Naganarasimha Naganarasimha G R added a comment - Yes, Sangjin Lee & Li Lu (inactive) , There are too many points to discuss wrt versions. But i felt scope of this jira was more on whether to have "yarn.timeline-service.enabled" is required in RM's SystemMetricsPublisher. Which i feel we need to stick to, as viewers of this jira might get confused is scope is more . And raise a new jira which can be subtask of 1.5 or 2 and discuss the pro's cons of multiple ATS versions to be supported. If OK will raise one in 1.5 I believe it strongly implies it applies to the server-side too. After all, even if it is indicated to the client that the timeline service is enabled, it would be no use obviously (and actually worse than not indicating) if the timeline service was not running. Well once we move to 1.5 and 2 we will anyway have some form of "yarn.timeline-service.version" which signify that Server needs to start timelineserver and RM's SMP will know whether to publish to timelineserver if the version is set. But as far as 2.7.2 is concerned i feel yarn.resourcemanager.system-metrics-publisher.enabled is sufficient to be configured. If not unnecessary additional configuration needs to be done by the admin. As far as i view it "yarn.timeline-service.enabled"* name is misleading, it should be more to signify client requires the timeline service's delegation token. Which will not be a server side config. Thoughts?
          Hide
          gtCarrera9 Li Lu added a comment -

          Sure. We can distribute the work and check them out. I can help since I've got some free cycles.

          Show
          gtCarrera9 Li Lu added a comment - Sure. We can distribute the work and check them out. I can help since I've got some free cycles.
          Hide
          sjlee0 Sangjin Lee added a comment -

          This is still very much an open-ended discussion. I recognize that trying to support enabling multiple versions may be pretty tricky and far-reaching. It would be great if we can pull that off, but if it leads to other negative consequences, we as a group may decide not to pursue it. For one, I know this would mean a pretty significant refactoring on the work for YARN-2928 as there are many places where we assume this is a binary choice (mutually exclusive).

          Show
          sjlee0 Sangjin Lee added a comment - This is still very much an open-ended discussion. I recognize that trying to support enabling multiple versions may be pretty tricky and far-reaching. It would be great if we can pull that off, but if it leads to other negative consequences, we as a group may decide not to pursue it. For one, I know this would mean a pretty significant refactoring on the work for YARN-2928 as there are many places where we assume this is a binary choice (mutually exclusive).
          Hide
          gtCarrera9 Li Lu added a comment -

          Also, if 1.5 is meant to include 1, then we can limit supported values as something like (1, 2), (1.5, 2), etc. but not (1, 1.5).

          Yes, so maybe we can require the value of this config to be a set of version numbers that are incompatible with each other. Each version number represents the highest support version number under the same big version number (v1, v2, v3, ...)

          Show
          gtCarrera9 Li Lu added a comment - Also, if 1.5 is meant to include 1, then we can limit supported values as something like (1, 2), (1.5, 2), etc. but not (1, 1.5). Yes, so maybe we can require the value of this config to be a set of version numbers that are incompatible with each other. Each version number represents the highest support version number under the same big version number (v1, v2, v3, ...)
          Hide
          sjlee0 Sangjin Lee added a comment -

          +1. I don't see a fundamental challenge to support "1", "1.5", and "1, 2" at the same time. One thing we need to be careful about is the meaning of the incremental relationships between v1 and v1.5. Having "1.5" in the config means the server supports ATS v1 API up to v1.5, but having "2" in the config does not indicate any support to ATS v1?

          Yes I think that is fine. The key motivation again is to be able to test and compare 1.x and 2. Also, if 1.5 is meant to include 1, then we can limit supported values as something like (1, 2), (1.5, 2), etc. but not (1, 1.5).

          Show
          sjlee0 Sangjin Lee added a comment - +1. I don't see a fundamental challenge to support "1", "1.5", and "1, 2" at the same time. One thing we need to be careful about is the meaning of the incremental relationships between v1 and v1.5. Having "1.5" in the config means the server supports ATS v1 API up to v1.5, but having "2" in the config does not indicate any support to ATS v1? Yes I think that is fine. The key motivation again is to be able to test and compare 1.x and 2. Also, if 1.5 is meant to include 1, then we can limit supported values as something like (1, 2), (1.5, 2), etc. but not (1, 1.5).
          Hide
          gtCarrera9 Li Lu added a comment -

          With v.2 and especially early on with v.2, it would be rather useful to be able to enable both v.1 (or v.1.5) and v.2. That would provide a useful verification and comparison environment with a single cluster. The way it's being discussed right now, it sounds like the version would be a single value (mutually exclusive). Wouldn't it be good to have a possibility to be able to enable more than one version? Thoughts?

          +1. I don't see a fundamental challenge to support "1", "1.5", and "1, 2" at the same time. One thing we need to be careful about is the meaning of the incremental relationships between v1 and v1.5. Having "1.5" in the config means the server supports ATS v1 API up to v1.5, but having "2" in the config does not indicate any support to ATS v1?

          I think it's a completely compatible interpretation it also means strongly that the server-side can interpret this as an instruction to enable all things timeline service.

          I agree. If it does not explicitly have a "client" in the config key, I think it is much safer to not to make this assumption.

          For example, I don't think we can make a guarantee that v.2 will be able to support all queries for v.1 and v.1.5. What's more appropriate is an exact match.

          Yes. I think the confusion here comes from the relationships between v1, v1.5 and v2. Does my proposal work here?

          Show
          gtCarrera9 Li Lu added a comment - With v.2 and especially early on with v.2, it would be rather useful to be able to enable both v.1 (or v.1.5) and v.2. That would provide a useful verification and comparison environment with a single cluster. The way it's being discussed right now, it sounds like the version would be a single value (mutually exclusive). Wouldn't it be good to have a possibility to be able to enable more than one version? Thoughts? +1. I don't see a fundamental challenge to support "1", "1.5", and "1, 2" at the same time. One thing we need to be careful about is the meaning of the incremental relationships between v1 and v1.5. Having "1.5" in the config means the server supports ATS v1 API up to v1.5, but having "2" in the config does not indicate any support to ATS v1? I think it's a completely compatible interpretation it also means strongly that the server-side can interpret this as an instruction to enable all things timeline service. I agree. If it does not explicitly have a "client" in the config key, I think it is much safer to not to make this assumption. For example, I don't think we can make a guarantee that v.2 will be able to support all queries for v.1 and v.1.5. What's more appropriate is an exact match. Yes. I think the confusion here comes from the relationships between v1, v1.5 and v2. Does my proposal work here?
          Hide
          sjlee0 Sangjin Lee added a comment -

          This is also somewhat relevant to YARN-4356.

          (yarn.timeline-service.version)
          I'd like to point out an interesting possibility raised in another JIRA by Joep Rottinghuis. With v.2 and especially early on with v.2, it would be rather useful to be able to enable both v.1 (or v.1.5) and v.2. That would provide a useful verification and comparison environment with a single cluster. The way it's being discussed right now, it sounds like the version would be a single value (mutually exclusive). Wouldn't it be good to have a possibility to be able to enable more than one version? Thoughts?

          (yarn.timeline-service.enabled)
          Although the documentation appears to suggest the use is primarily for clients, I believe it strongly implies it applies to the server-side too. After all, even if it is indicated to the client that the timeline service is enabled, it would be no use obviously (and actually worse than not indicating) if the timeline service was not running. I think it's a completely compatible interpretation it also means strongly that the server-side can interpret this as an instruction to enable all things timeline service.

          if the timelineserver daemon is started it directly starts the timelinestore without checking for the configuration "yarn.timeline-service.enabled"

          The timelineserver daemon might be able to do it, but the issue is with any other server-side component (e.g. RM, etc.) that needs to see if the timeline server should be supported/used. IMO it should be checked by everything (server-side and client-side) to see if the timeline service is and should be enabled.

          (client-side)
          I also think ideally each framework (client) should define whether they will use the timeline service. Just because the timeline service is enabled doesn't mean they will use it (like MR today). Ideally the framework should have its own config to use it. Any code of theirs to use the timeline service should be predicated on both properties being true.

          Regarding versions, as long as clients can discover that the version they want to use is included in the enabled versions (see above) and if their config is also enabled, it should be able to write/read (provided the APIs exist of course).

          Lastly,

          IIUC, the newly proposed yarn.timeline-service.version supports a sanity check mechanism: each API should check if the current running ATS's version is equal to or higher than it's required version.

          I'm not entirely sure if this is feasible. "Higher" doesn't necessarily mean it will support all versions at and below its own. For example, I don't think we can make a guarantee that v.2 will be able to support all queries for v.1 and v.1.5. What's more appropriate is an exact match. IMO we should not make guarantees about lower version compatibility as it's going to be very challenging to pull off and constraining for the newer implementation.

          Show
          sjlee0 Sangjin Lee added a comment - This is also somewhat relevant to YARN-4356 . ( yarn.timeline-service.version ) I'd like to point out an interesting possibility raised in another JIRA by Joep Rottinghuis . With v.2 and especially early on with v.2, it would be rather useful to be able to enable both v.1 (or v.1.5) and v.2. That would provide a useful verification and comparison environment with a single cluster. The way it's being discussed right now, it sounds like the version would be a single value (mutually exclusive). Wouldn't it be good to have a possibility to be able to enable more than one version? Thoughts? ( yarn.timeline-service.enabled ) Although the documentation appears to suggest the use is primarily for clients, I believe it strongly implies it applies to the server-side too. After all, even if it is indicated to the client that the timeline service is enabled, it would be no use obviously (and actually worse than not indicating) if the timeline service was not running. I think it's a completely compatible interpretation it also means strongly that the server-side can interpret this as an instruction to enable all things timeline service. if the timelineserver daemon is started it directly starts the timelinestore without checking for the configuration "yarn.timeline-service.enabled" The timelineserver daemon might be able to do it, but the issue is with any other server-side component (e.g. RM, etc.) that needs to see if the timeline server should be supported/used. IMO it should be checked by everything (server-side and client-side) to see if the timeline service is and should be enabled. ( client-side ) I also think ideally each framework (client) should define whether they will use the timeline service. Just because the timeline service is enabled doesn't mean they will use it (like MR today). Ideally the framework should have its own config to use it. Any code of theirs to use the timeline service should be predicated on both properties being true. Regarding versions, as long as clients can discover that the version they want to use is included in the enabled versions (see above) and if their config is also enabled, it should be able to write/read (provided the APIs exist of course). Lastly, IIUC, the newly proposed yarn.timeline-service.version supports a sanity check mechanism: each API should check if the current running ATS's version is equal to or higher than it's required version. I'm not entirely sure if this is feasible. "Higher" doesn't necessarily mean it will support all versions at and below its own. For example, I don't think we can make a guarantee that v.2 will be able to support all queries for v.1 and v.1.5. What's more appropriate is an exact match . IMO we should not make guarantees about lower version compatibility as it's going to be very challenging to pull off and constraining for the newer implementation.
          Hide
          Naganarasimha Naganarasimha G R added a comment -

          Hi All,
          Lately observed there is already a existing configuration "yarn.timeline-service.client.best-effort" which ensures even YARNClient doesnt throw exception even if delegation token fetching fails in YARNClient.SubmitApplication. So i feel there is sufficient guard in the client side when to create Timelineclient and fetch the delegation token and if fail what action to be taken,
          So as part of this jira, as mentioned by Vinod Kumar Vavilapalli i think we need to just remove the check for yarn.timeline-service.enable being used in SystemMetricsPublisher as its a client side configuration (correct me if i am wrong). And as part of YARN-4234 we are anyway concentrating on how to support for multiple versions in TimelineClient. right ?

          Show
          Naganarasimha Naganarasimha G R added a comment - Hi All, Lately observed there is already a existing configuration "yarn.timeline-service.client.best-effort" which ensures even YARNClient doesnt throw exception even if delegation token fetching fails in YARNClient.SubmitApplication . So i feel there is sufficient guard in the client side when to create Timelineclient and fetch the delegation token and if fail what action to be taken, So as part of this jira, as mentioned by Vinod Kumar Vavilapalli i think we need to just remove the check for yarn.timeline-service.enable being used in SystemMetricsPublisher as its a client side configuration (correct me if i am wrong). And as part of YARN-4234 we are anyway concentrating on how to support for multiple versions in TimelineClient. right ?
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #598 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/598/)
          YARN-4183. Reverting the patch to fix behaviour change. Revert (vinodkv: rev 6351d3fa638f1d901279cef9e56dc4e07ef3de11)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java
          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #598 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/598/ ) YARN-4183 . Reverting the patch to fix behaviour change. Revert (vinodkv: rev 6351d3fa638f1d901279cef9e56dc4e07ef3de11) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Hdfs-trunk #2536 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2536/)
          YARN-4183. Reverting the patch to fix behaviour change. Revert (vinodkv: rev 6351d3fa638f1d901279cef9e56dc4e07ef3de11)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java
          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk #2536 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2536/ ) YARN-4183 . Reverting the patch to fix behaviour change. Revert (vinodkv: rev 6351d3fa638f1d901279cef9e56dc4e07ef3de11) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #658 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/658/)
          YARN-4183. Reverting the patch to fix behaviour change. Revert (vinodkv: rev 6351d3fa638f1d901279cef9e56dc4e07ef3de11)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java
          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #658 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/658/ ) YARN-4183 . Reverting the patch to fix behaviour change. Revert (vinodkv: rev 6351d3fa638f1d901279cef9e56dc4e07ef3de11) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Yarn-trunk #1394 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/1394/)
          YARN-4183. Reverting the patch to fix behaviour change. Revert (vinodkv: rev 6351d3fa638f1d901279cef9e56dc4e07ef3de11)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java
          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk #1394 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/1394/ ) YARN-4183 . Reverting the patch to fix behaviour change. Revert (vinodkv: rev 6351d3fa638f1d901279cef9e56dc4e07ef3de11) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk #2598 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2598/)
          YARN-4183. Reverting the patch to fix behaviour change. Revert (vinodkv: rev 6351d3fa638f1d901279cef9e56dc4e07ef3de11)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java
          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk #2598 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2598/ ) YARN-4183 . Reverting the patch to fix behaviour change. Revert (vinodkv: rev 6351d3fa638f1d901279cef9e56dc4e07ef3de11) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #670 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/670/)
          YARN-4183. Reverting the patch to fix behaviour change. Revert (vinodkv: rev 6351d3fa638f1d901279cef9e56dc4e07ef3de11)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java
          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #670 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/670/ ) YARN-4183 . Reverting the patch to fix behaviour change. Revert (vinodkv: rev 6351d3fa638f1d901279cef9e56dc4e07ef3de11) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-trunk-Commit #8793 (See https://builds.apache.org/job/Hadoop-trunk-Commit/8793/)
          YARN-4183. Reverting the patch to fix behaviour change. Revert (vinodkv: rev 6351d3fa638f1d901279cef9e56dc4e07ef3de11)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java
          • hadoop-yarn-project/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #8793 (See https://builds.apache.org/job/Hadoop-trunk-Commit/8793/ ) YARN-4183 . Reverting the patch to fix behaviour change. Revert (vinodkv: rev 6351d3fa638f1d901279cef9e56dc4e07ef3de11) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java hadoop-yarn-project/CHANGES.txt
          Hide
          vinodkv Vinod Kumar Vavilapalli added a comment -

          All of this needs more work, so unless I hear strongly otherwise I am going to revert this patch in the interest of 2.7.2's progress.

          Seeing no No's, reverted this for the sake of 2.7.2.

          Show
          vinodkv Vinod Kumar Vavilapalli added a comment - All of this needs more work, so unless I hear strongly otherwise I am going to revert this patch in the interest of 2.7.2's progress. Seeing no No's, reverted this for the sake of 2.7.2.
          Hide
          gtCarrera9 Li Lu added a comment -

          Discussed with Xuan Gong offline, and seems like YARN-4234 is blocked on the proposed yarn.timeline-service.version config. If this is the case maybe we can add the config in YARN-4234 and have it reviewed quickly?

          Show
          gtCarrera9 Li Lu added a comment - Discussed with Xuan Gong offline, and seems like YARN-4234 is blocked on the proposed yarn.timeline-service.version config. If this is the case maybe we can add the config in YARN-4234 and have it reviewed quickly?
          Hide
          gtCarrera9 Li Lu added a comment -

          Already Subtask YARN-3623 is raised for this, hope i can work on this ?

          Let's fix the config problem raised in YARN-3623 here since it's no longer a ATS v2 problem. Please feel free to open a new JIRA for the API fix in YARN-2928. If you happen to have cycles feel free to assign it to you. Sangjin Lee any suggestions here? Thanks!

          Show
          gtCarrera9 Li Lu added a comment - Already Subtask YARN-3623 is raised for this, hope i can work on this ? Let's fix the config problem raised in YARN-3623 here since it's no longer a ATS v2 problem. Please feel free to open a new JIRA for the API fix in YARN-2928 . If you happen to have cycles feel free to assign it to you. Sangjin Lee any suggestions here? Thanks!
          Hide
          Naganarasimha Naganarasimha G R added a comment -

          Thanks Vinod Kumar Vavilapalli and Li Lu (inactive), for clarifications for the approach.
          Few more clarifications i would like to have :

          • How to support backward compatability for existing users who are already using yarn.timeline-service.enabled to get the tokens.
          • Do we need to support the use cases which were mentioned by Jonathan,
            1. Suport Soft limit and hard settings for the clients : dont throw any exception if not able get timeline delegation token and may be job should be still be able to progress.
            2. If the creation of the timelineclient based on this new yarn.timeline-service.version fails, then do we need to stop the RM ? or just keep on trying and once its able to contact then start pushing the system metrics events ?

          Li Lu (inactive),

          I can open a new subtask in YARN-2928 to fix this for V2.

          Already Subtask YARN-3623 is raised for this, hope i can work on this ?

          Show
          Naganarasimha Naganarasimha G R added a comment - Thanks Vinod Kumar Vavilapalli and Li Lu (inactive) , for clarifications for the approach. Few more clarifications i would like to have : How to support backward compatability for existing users who are already using yarn.timeline-service.enabled to get the tokens. Do we need to support the use cases which were mentioned by Jonathan, Suport Soft limit and hard settings for the clients : dont throw any exception if not able get timeline delegation token and may be job should be still be able to progress. If the creation of the timelineclient based on this new yarn.timeline-service.version fails, then do we need to stop the RM ? or just keep on trying and once its able to contact then start pushing the system metrics events ? Li Lu (inactive) , I can open a new subtask in YARN-2928 to fix this for V2. Already Subtask YARN-3623 is raised for this, hope i can work on this ?
          Hide
          gtCarrera9 Li Lu added a comment -

          Thanks Vinod Kumar Vavilapalli! I agree we should find some better ways to organize *.enabled in ATS (we've got two different versions in our code base and will add two more). For end users, we need to provide mechanisms to distinguish at least 3 versions of ATS API calls, v1, 1.5, and v2 in future.

          There should be an explicit yarn.timeline-service.version which tells YarnClient to get tokens or not - yes for non-present version (default), v1, v2 but no for v1.5.

          The version field has semantics on both client and server side at the same time - it's picking a solution end-to-end.

          IIUC, the newly proposed yarn.timeline-service.version supports a sanity check mechanism: each API should check if the current running ATS's version is equal to or higher than it's required version. For example, when a ATS v1.5 API is called, but yarn.timeline-service.version is set to v1, it should simply throw an exception. We can also decide if we need to get tokens or not in YARN client by checking this version number.

          We can distinguish API versions through their names. We need to keep the V1 APIs unchanged, but add V15 and V2 after the new APIs to clarify their API version. Inside each V15 and V2 API we can perform the sanity check.

          Let's make the yarn.timeline-service.version change here. We can modify V1.5 APIs in YARN-4233/YARN-4234 and V2 APIs as a subtask of YARN-2928. I can open a new subtask in YARN-2928 to fix this for V2.

          Show
          gtCarrera9 Li Lu added a comment - Thanks Vinod Kumar Vavilapalli ! I agree we should find some better ways to organize *.enabled in ATS (we've got two different versions in our code base and will add two more). For end users, we need to provide mechanisms to distinguish at least 3 versions of ATS API calls, v1, 1.5, and v2 in future. There should be an explicit yarn.timeline-service.version which tells YarnClient to get tokens or not - yes for non-present version (default), v1, v2 but no for v1.5. The version field has semantics on both client and server side at the same time - it's picking a solution end-to-end. IIUC, the newly proposed yarn.timeline-service.version supports a sanity check mechanism: each API should check if the current running ATS's version is equal to or higher than it's required version. For example, when a ATS v1.5 API is called, but yarn.timeline-service.version is set to v1, it should simply throw an exception. We can also decide if we need to get tokens or not in YARN client by checking this version number. We can distinguish API versions through their names. We need to keep the V1 APIs unchanged, but add V15 and V2 after the new APIs to clarify their API version. Inside each V15 and V2 API we can perform the sanity check. Let's make the yarn.timeline-service.version change here. We can modify V1.5 APIs in YARN-4233 / YARN-4234 and V2 APIs as a subtask of YARN-2928 . I can open a new subtask in YARN-2928 to fix this for V2.
          Hide
          vinodkv Vinod Kumar Vavilapalli added a comment -

          Getting the obvious out of the way: It's a mess.

          How things worked before this JIRA

          • RM uses generic-application-history.enabled to activate RMApplicationHistoryWriter (RM sending app events to the now-dead-but-kept-for-compat APPLICATION_HISTORY_STORE)
          • RM uses yarn.timeline-service.enabled + yarn.resourcemanager.system-metrics-publisher.enabled to write app/app-attempt/container events to Timeline Service
          • YarnClient uses generic-application-history.enabled to talk to the history server irrespective of where the historic data gets stored
          • TimelineClient (embedded inside YarnClient) uses yarn.timeline-service.enabled to get tokens and populate during app-submission.

          Quick general context

          • Nobody is expected to use RMApplicationHistoryWriter
          • yarn.timeline-service.generic-application-history.enabled is also supposed to be dead for all purposes. But it is today used beyond the RM using it to activate RMApplicationHistoryWriter
          • SystemMetricsPublisher only writes events to TimelineService (v1, v1.5)

          Given the above, I can't but conclude that the existing configuration is not modeled correctly.

          The right thing to do

          • Make SystemMetricsPublisher only respect yarn.resourcemanager.system-metrics-publisher.enabled
          • Leave yarn.timeline-service.generic-application-history.enabled as a dead property only to activate RMApplicationHistoryWriter.
          • We can leave yarn.timeline-service.generic-application-history.enabled to also activate client -> RM for historical data or make RM always proxy these calls for the client
          • There should be an explicit yarn.timeline-service.version which tells YarnClient to get tokens or not - yes for non-present version (default), v1, v2 but no for v1.5.
          • We should also use the same property in the new API calls proposed for V1.5 YARN-4233 / V2 YARN-2928, lest the users think they can call any API independent of what is supported on server side. The version field has semantics on both client and server side at the same time - it's picking a solution end-to-end.

          Immediate step

          All of this needs more work, so unless I hear strongly otherwise I am going to revert this patch in the interest of 2.7.2's progress.

          /cc Hitesh Shah Li Lu Xuan Gong Sangjin Lee

          Show
          vinodkv Vinod Kumar Vavilapalli added a comment - Getting the obvious out of the way: It's a mess. How things worked before this JIRA RM uses generic-application-history.enabled to activate RMApplicationHistoryWriter (RM sending app events to the now-dead-but-kept-for-compat APPLICATION_HISTORY_STORE) RM uses yarn.timeline-service.enabled + yarn.resourcemanager.system-metrics-publisher.enabled to write app/app-attempt/container events to Timeline Service YarnClient uses generic-application-history.enabled to talk to the history server irrespective of where the historic data gets stored TimelineClient (embedded inside YarnClient) uses yarn.timeline-service.enabled to get tokens and populate during app-submission. Quick general context Nobody is expected to use RMApplicationHistoryWriter yarn.timeline-service.generic-application-history.enabled is also supposed to be dead for all purposes. But it is today used beyond the RM using it to activate RMApplicationHistoryWriter SystemMetricsPublisher only writes events to TimelineService (v1, v1.5) Given the above, I can't but conclude that the existing configuration is not modeled correctly. The right thing to do Make SystemMetricsPublisher only respect yarn.resourcemanager.system-metrics-publisher.enabled Leave yarn.timeline-service.generic-application-history.enabled as a dead property only to activate RMApplicationHistoryWriter. We can leave yarn.timeline-service.generic-application-history.enabled to also activate client -> RM for historical data or make RM always proxy these calls for the client There should be an explicit yarn.timeline-service.version which tells YarnClient to get tokens or not - yes for non-present version (default), v1, v2 but no for v1.5. We should also use the same property in the new API calls proposed for V1.5 YARN-4233 / V2 YARN-2928 , lest the users think they can call any API independent of what is supported on server side. The version field has semantics on both client and server side at the same time - it's picking a solution end-to-end. Immediate step All of this needs more work, so unless I hear strongly otherwise I am going to revert this patch in the interest of 2.7.2's progress. /cc Hitesh Shah Li Lu Xuan Gong Sangjin Lee
          Hide
          Naganarasimha Naganarasimha G R added a comment -

          Hi Xuan Gong,

          The value for yarn.timeline-service.enabled only means whether we have ATS daemon or not. We should not use this configuration to decide whether the job needs to get the ATS DT.

          Just went through the references of all yarn.timeline-service.enabled configuration, and one thing i could understand was, its not used to indicate ATS daemon is started but kind of looks like client wants to use ATS daemon or not. and matches with the description in the document "Indicate to clients whether timeline service is enabled or not. If enabled, clients will put entities and events to the timeline server."
          Also if the timelineserver daemon is started it directly starts the timelinestore without checking for the configuration "yarn.timeline-service.enabled"
          So its as good as have this configuration if the client wants to put timeline entities else disable.

          Show
          Naganarasimha Naganarasimha G R added a comment - Hi Xuan Gong , The value for yarn.timeline-service.enabled only means whether we have ATS daemon or not. We should not use this configuration to decide whether the job needs to get the ATS DT. Just went through the references of all yarn.timeline-service.enabled configuration, and one thing i could understand was, its not used to indicate ATS daemon is started but kind of looks like client wants to use ATS daemon or not. and matches with the description in the document "Indicate to clients whether timeline service is enabled or not. If enabled, clients will put entities and events to the timeline server." Also if the timelineserver daemon is started it directly starts the timelinestore without checking for the configuration "yarn.timeline-service.enabled" So its as good as have this configuration if the client wants to put timeline entities else disable.
          Hide
          xgong Xuan Gong added a comment -

          Sorry for the late response.

          Here is my understanding:

          • The current problem is that all the jobs are enforced to get ATS DT even if those jobs do not want to connect in future.
          • The value for yarn.timeline-service.enabled only means whether we have ATS daemon or not. We should not use this configuration to decide whether the job needs to get the ATS DT.
          • I think that the part of the reason, why we marked this configuration "yarn.timeline-service.generic-application-history.enabled" as private instead of deleting it, is for the compatibility.

          Jonathan Eagles I agree with all of your comments. But I think the concerns from Naganarasimha G R, especially compatibility part, makes sense.

          If the main issue is for creation of delegation tokens i would rather prefer to have some option in the clients to determine whether to create create ATS delegations tokens or not. Thoughts?

          It might be better if we could have options for the applications to choose whether they need ATS DT or not.

          Show
          xgong Xuan Gong added a comment - Sorry for the late response. Here is my understanding: The current problem is that all the jobs are enforced to get ATS DT even if those jobs do not want to connect in future. The value for yarn.timeline-service.enabled only means whether we have ATS daemon or not. We should not use this configuration to decide whether the job needs to get the ATS DT. I think that the part of the reason, why we marked this configuration "yarn.timeline-service.generic-application-history.enabled" as private instead of deleting it, is for the compatibility. Jonathan Eagles I agree with all of your comments. But I think the concerns from Naganarasimha G R , especially compatibility part, makes sense. If the main issue is for creation of delegation tokens i would rather prefer to have some option in the clients to determine whether to create create ATS delegations tokens or not. Thoughts? It might be better if we could have options for the applications to choose whether they need ATS DT or not.
          Hide
          Naganarasimha Naganarasimha G R added a comment -

          Oops, Missed this comment Sangjin Lee,

          Documentation gives a fair idea and its same as per my understanding,
          yarn.timeline-service.enabled : Indicate to clients whether timeline service is enabled or not. If enabled, clients will put entities and events to the timeline server.

          yarn.timeline-service.generic-application-history.enabled : Indicate to clients whether to query generic application data from timeline history-service or not. If not enabled then application data is queried only from Resource Manager. Defaults to false. (which is currently not there in documentation but present in TimelineServer.MD)

          yarn.resourcemanager.system-metrics-publisher.enabled : The setting that controls whether yarn system metrics is published on the timeline server or not by RM. (This requires yarn.timeline-service.enabled to be enabled which requires a doc update).

          AHS on timelinestore is started if started if "YarnConfiguration.APPLICATION_HISTORY_STORE" is not configured.

          Show
          Naganarasimha Naganarasimha G R added a comment - Oops, Missed this comment Sangjin Lee , Documentation gives a fair idea and its same as per my understanding, yarn.timeline-service.enabled : Indicate to clients whether timeline service is enabled or not. If enabled, clients will put entities and events to the timeline server. yarn.timeline-service.generic-application-history.enabled : Indicate to clients whether to query generic application data from timeline history-service or not. If not enabled then application data is queried only from Resource Manager. Defaults to false. (which is currently not there in documentation but present in TimelineServer.MD) yarn.resourcemanager.system-metrics-publisher.enabled : The setting that controls whether yarn system metrics is published on the timeline server or not by RM. (This requires yarn.timeline-service.enabled to be enabled which requires a doc update). AHS on timelinestore is started if started if "YarnConfiguration.APPLICATION_HISTORY_STORE" is not configured.
          Hide
          vinodkv Vinod Kumar Vavilapalli added a comment -

          I was asked to stall 2.7.2 for this JIRA. Reopening it while we discuss this more.

          Show
          vinodkv Vinod Kumar Vavilapalli added a comment - I was asked to stall 2.7.2 for this JIRA. Reopening it while we discuss this more.
          Hide
          Naganarasimha Naganarasimha G R added a comment -

          Hi Jonathan Eagles,
          We are not using ATS at scale you guys are using so you will be the better judge but just my 2 cents as an observer :

          The issue regarding this jira is that putting yarn.timeline-service.enabled in the client xml (breaks #2 above) forces every job (both MR (not using timeline service) and Tez (using timeline service)) to have a runtime dependency on the timeline service.

          By having diff configurations for Tez and MR clients can this problem get solved. Or i would prefer rather introducing one more config here which takes care of by passing the failure to get delegation tokens. And also as you mentioned for soft limit for applications can make use of the same parameter.

          3.YARN services that interact with the timeline server (Generic History Server), may have runtime dependency of the timeline service that does not disrupt job submission

          may be this also we can handle in a different jira if req we can start even if timeline client is not up and once up, System Metrics publisher can start accepting the timeline events. Thoughts ?
          The purpose of "yarn.timeline-service.generic-application-history.enabled" is different as per the documentation. so instead we can either remove the check for "TIMELINE_SERVICE_ENABLED" here than check for "yarn.timeline-service.generic-application-history.enabled" or create new configuration in the client to avoid creation of tokens when not req. Thoughts?

          Show
          Naganarasimha Naganarasimha G R added a comment - Hi Jonathan Eagles , We are not using ATS at scale you guys are using so you will be the better judge but just my 2 cents as an observer : The issue regarding this jira is that putting yarn.timeline-service.enabled in the client xml (breaks #2 above) forces every job (both MR (not using timeline service) and Tez (using timeline service)) to have a runtime dependency on the timeline service. By having diff configurations for Tez and MR clients can this problem get solved. Or i would prefer rather introducing one more config here which takes care of by passing the failure to get delegation tokens. And also as you mentioned for soft limit for applications can make use of the same parameter. 3.YARN services that interact with the timeline server (Generic History Server), may have runtime dependency of the timeline service that does not disrupt job submission may be this also we can handle in a different jira if req we can start even if timeline client is not up and once up, System Metrics publisher can start accepting the timeline events. Thoughts ? The purpose of "yarn.timeline-service.generic-application-history.enabled" is different as per the documentation. so instead we can either remove the check for "TIMELINE_SERVICE_ENABLED" here than check for "yarn.timeline-service.generic-application-history.enabled" or create new configuration in the client to avoid creation of tokens when not req. Thoughts?
          Hide
          jeagles Jonathan Eagles added a comment -

          Here are the requirements that users at scale need, and unfortunately the config design does not allow for this properly. Let me draw up what the requirements in my mind should be based my current knowledge. This is by no means an edict, but just a conversation starting point, so you know where I'm coming from.

          1. Jobs that make use of the timeline service, may have a hard or soft runtime on the timeline service
            • Jobs that interact directly with the timeline service (TimelineClient) should obtain delegation token to use the service and optionally allow for non-fatal runtime dependency (job is allowed to run, but no history is written)
            • Jobs that don't interact with the timeline service (EntityFileTimelineClient), should obtain HDFS delegation tokens, but should not obtain timeline service delegation tokens.
          2. Jobs that don't make user of the timeline service, should have no runtime dependency on the timeline service and should be allowed freely to submit and run jobs if the regardless of the timeline service status.
          3. YARN services that interact with the timeline server (Generic History Server), may have runtime dependency of the timeline service that does not disrupt job submission.

          The issue regarding this jira is that putting yarn.timeline-service.enabled in the client xml (breaks #2 above) forces every job (both MR (not using timeline service) and Tez (using timeline service)) to have a runtime dependency on the timeline service. This places an artificial runtime dependency on the timeline service which is not highly available or highly scalable until v2.0.

          The issue regarding putting the yarn.timeline-service.enabled in the resource manager (breaks #3 above) is that every YarnClientImpl (used in job status, used in job submission) now reaches out to get a delegation token token. This places the timeline service (neither highly scalable or highly available until v2.0) as a runtime dependency for job submission and get many unnecessary delegation token for YarnClients that never intent to use them.

          Show
          jeagles Jonathan Eagles added a comment - Here are the requirements that users at scale need, and unfortunately the config design does not allow for this properly. Let me draw up what the requirements in my mind should be based my current knowledge. This is by no means an edict, but just a conversation starting point, so you know where I'm coming from. Jobs that make use of the timeline service, may have a hard or soft runtime on the timeline service Jobs that interact directly with the timeline service (TimelineClient) should obtain delegation token to use the service and optionally allow for non-fatal runtime dependency (job is allowed to run, but no history is written) Jobs that don't interact with the timeline service (EntityFileTimelineClient), should obtain HDFS delegation tokens, but should not obtain timeline service delegation tokens. Jobs that don't make user of the timeline service, should have no runtime dependency on the timeline service and should be allowed freely to submit and run jobs if the regardless of the timeline service status. YARN services that interact with the timeline server (Generic History Server), may have runtime dependency of the timeline service that does not disrupt job submission. The issue regarding this jira is that putting yarn.timeline-service.enabled in the client xml (breaks #2 above) forces every job (both MR (not using timeline service) and Tez (using timeline service)) to have a runtime dependency on the timeline service. This places an artificial runtime dependency on the timeline service which is not highly available or highly scalable until v2.0. The issue regarding putting the yarn.timeline-service.enabled in the resource manager (breaks #3 above) is that every YarnClientImpl (used in job status, used in job submission) now reaches out to get a delegation token token. This places the timeline service (neither highly scalable or highly available until v2.0) as a runtime dependency for job submission and get many unnecessary delegation token for YarnClients that never intent to use them.
          Hide
          sjlee0 Sangjin Lee added a comment -

          Sorry I missed this one as well.

          Maybe this is a FAQ somewhere, but what are the relationships among the following 3 settings?

          1. yarn.timeline-service.enabled
          2. yarn.timeline-service.generic-application-history.enabled
          3. yarn.resourcemanager.system-metrics-publisher.enabled

          Can (1) and (2) be set independently, or does setting one have an implication on the other? How about (3)?

          From the v.2 perspective, there is no separate "generic application history service" any way, and we will have to handle this problem in a different manner.

          Show
          sjlee0 Sangjin Lee added a comment - Sorry I missed this one as well. Maybe this is a FAQ somewhere, but what are the relationships among the following 3 settings? yarn.timeline-service.enabled yarn.timeline-service.generic-application-history.enabled yarn.resourcemanager.system-metrics-publisher.enabled Can (1) and (2) be set independently, or does setting one have an implication on the other? How about (3)? From the v.2 perspective, there is no separate "generic application history service" any way, and we will have to handle this problem in a different manner.
          Hide
          Naganarasimha Naganarasimha G R added a comment -

          Hi Jonathan Eagles ,Mit Desai & Xuan Gong,
          Sorry to pitch in very late on this, but IMHO i would like differ from the approach taken in the patch for :

          • The purpose of using "yarn.timeline-service.generic-application-history.enabled" was kept for only kept only to determine whether clients needs to pick the information from history server or only RM. This is as per the documentation ( which was updated as per the comments from Zhijie Shen). So it seems like deviating from the last known purpose and would it break the compatability ?
          • Whats the point of publishing the ATS events if timeline is not enabled ? it would unnecessarily populate the RM logs if ATS is not enabled.
          • If the main issue is for creation of delegation tokens i would rather prefer to have some option in the clients to determine whether to create create ATS delegations tokens or not. Thoughts?

          From ATSV2 team Sangjin Lee,Junping Du,Li Lu (inactive) Any thoughts ?

          Show
          Naganarasimha Naganarasimha G R added a comment - Hi Jonathan Eagles , Mit Desai & Xuan Gong , Sorry to pitch in very late on this, but IMHO i would like differ from the approach taken in the patch for : The purpose of using "yarn.timeline-service.generic-application-history.enabled" was kept for only kept only to determine whether clients needs to pick the information from history server or only RM. This is as per the documentation ( which was updated as per the comments from Zhijie Shen ). So it seems like deviating from the last known purpose and would it break the compatability ? Whats the point of publishing the ATS events if timeline is not enabled ? it would unnecessarily populate the RM logs if ATS is not enabled. If the main issue is for creation of delegation tokens i would rather prefer to have some option in the clients to determine whether to create create ATS delegations tokens or not. Thoughts? From ATSV2 team Sangjin Lee , Junping Du , Li Lu (inactive) Any thoughts ?
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #553 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/553/)
          YARN-4183. Enabling generic application history forces every job to get (jeagles: rev c293c58954cdab25c8c69418b0e839883b563fa4)

          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #553 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/553/ ) YARN-4183 . Enabling generic application history forces every job to get (jeagles: rev c293c58954cdab25c8c69418b0e839883b563fa4) hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Hdfs-trunk #2490 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2490/)
          YARN-4183. Enabling generic application history forces every job to get (jeagles: rev c293c58954cdab25c8c69418b0e839883b563fa4)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java
          • hadoop-yarn-project/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk #2490 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2490/ ) YARN-4183 . Enabling generic application history forces every job to get (jeagles: rev c293c58954cdab25c8c69418b0e839883b563fa4) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java hadoop-yarn-project/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #615 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/615/)
          YARN-4183. Enabling generic application history forces every job to get (jeagles: rev c293c58954cdab25c8c69418b0e839883b563fa4)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java
          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #615 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/615/ ) YARN-4183 . Enabling generic application history forces every job to get (jeagles: rev c293c58954cdab25c8c69418b0e839883b563fa4) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Yarn-trunk #1338 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/1338/)
          YARN-4183. Enabling generic application history forces every job to get (jeagles: rev c293c58954cdab25c8c69418b0e839883b563fa4)

          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk #1338 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/1338/ ) YARN-4183 . Enabling generic application history forces every job to get (jeagles: rev c293c58954cdab25c8c69418b0e839883b563fa4) hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk #2545 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2545/)
          YARN-4183. Enabling generic application history forces every job to get (jeagles: rev c293c58954cdab25c8c69418b0e839883b563fa4)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java
          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk #2545 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2545/ ) YARN-4183 . Enabling generic application history forces every job to get (jeagles: rev c293c58954cdab25c8c69418b0e839883b563fa4) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #603 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/603/)
          YARN-4183. Enabling generic application history forces every job to get (jeagles: rev c293c58954cdab25c8c69418b0e839883b563fa4)

          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #603 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/603/ ) YARN-4183 . Enabling generic application history forces every job to get (jeagles: rev c293c58954cdab25c8c69418b0e839883b563fa4) hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-trunk-Commit #8726 (See https://builds.apache.org/job/Hadoop-trunk-Commit/8726/)
          YARN-4183. Enabling generic application history forces every job to get (jeagles: rev c293c58954cdab25c8c69418b0e839883b563fa4)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java
          • hadoop-yarn-project/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #8726 (See https://builds.apache.org/job/Hadoop-trunk-Commit/8726/ ) YARN-4183 . Enabling generic application history forces every job to get (jeagles: rev c293c58954cdab25c8c69418b0e839883b563fa4) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java hadoop-yarn-project/CHANGES.txt
          Hide
          jeagles Jonathan Eagles added a comment -

          Xuan Gong, I haven't heard back from you regarding this patch. Unless you have a strong opinion regarding this patch, I think it's ready to go in. Please chime in if you have thoughts or alternatives.

          Show
          jeagles Jonathan Eagles added a comment - Xuan Gong , I haven't heard back from you regarding this patch. Unless you have a strong opinion regarding this patch, I think it's ready to go in. Please chime in if you have thoughts or alternatives.
          Hide
          jeagles Jonathan Eagles added a comment -

          +1. Xuan Gong, unless you have any strong feeling against this patch, I'll commit this tomorrow.

          Show
          jeagles Jonathan Eagles added a comment - +1. Xuan Gong , unless you have any strong feeling against this patch, I'll commit this tomorrow.
          Hide
          jeagles Jonathan Eagles added a comment -

          Xuan Gong, any thoughts on the current patch based on my above comments?

          Show
          jeagles Jonathan Eagles added a comment - Xuan Gong , any thoughts on the current patch based on my above comments?
          Hide
          jeagles Jonathan Eagles added a comment -

          Xuan Gong, this issue is two-fold. 1) The web services publishing should trigger posting history based on generic history enablement and not timeline server enablement. 2) There is still no separation between timeline clients that require delegation tokens and those that don't. See YARN-3942. As a result, if timelineservice is enabled at the global level, then each yarn client will get a timeline delegation token which makes the timeline service a live dependency. Meaning if the timeline service is down, then the grid is down.

          This patch above is a clean way to avoid enabling the timeline service for all YarnClients in the cluster.

          Show
          jeagles Jonathan Eagles added a comment - Xuan Gong , this issue is two-fold. 1) The web services publishing should trigger posting history based on generic history enablement and not timeline server enablement. 2) There is still no separation between timeline clients that require delegation tokens and those that don't. See YARN-3942 . As a result, if timelineservice is enabled at the global level, then each yarn client will get a timeline delegation token which makes the timeline service a live dependency. Meaning if the timeline service is down, then the grid is down. This patch above is a clean way to avoid enabling the timeline service for all YarnClients in the cluster.
          Hide
          mitdesai Mit Desai added a comment -

          Here is the scenario. We want the Yarn application to not use the timeline server during execution but use the application history server for the logs. This will not be possible with the current implementation. It is either both or none.

          If we check for application history enabled, it indirectly tells that timeline service is enable. Because history server will not be enabled without enabling the timeline server. This way, the system metrics publisher can publish events to the history server even if the applications do not use the timeline server for execution.

          Show
          mitdesai Mit Desai added a comment - Here is the scenario. We want the Yarn application to not use the timeline server during execution but use the application history server for the logs. This will not be possible with the current implementation. It is either both or none. If we check for application history enabled, it indirectly tells that timeline service is enable. Because history server will not be enabled without enabling the timeline server. This way, the system metrics publisher can publish events to the history server even if the applications do not use the timeline server for execution.
          Hide
          xgong Xuan Gong added a comment -

          Mit Desai I do not understand why we need to make this change.

          To make it work, if the timeline service flag is turned on, it will force every yarn application to get a delegation token.

          It already exists, doesn't it ?

                if (UserGroupInformation.isSecurityEnabled()
                    && conf.getBoolean(YarnConfiguration.TIMELINE_SERVICE_ENABLED, false)) {
                  Token<TimelineDelegationTokenIdentifier> token =
                      client.getDelegationToken(
                          UserGroupInformation.getCurrentUser().getUserName());
                  UserGroupInformation.getCurrentUser().addToken(token);
                }
          

          Instead of checking if timeline service is enabled, we should be checking if application history server is enabled.

          Why ?

          Show
          xgong Xuan Gong added a comment - Mit Desai I do not understand why we need to make this change. To make it work, if the timeline service flag is turned on, it will force every yarn application to get a delegation token. It already exists, doesn't it ? if (UserGroupInformation.isSecurityEnabled() && conf.getBoolean(YarnConfiguration.TIMELINE_SERVICE_ENABLED, false )) { Token<TimelineDelegationTokenIdentifier> token = client.getDelegationToken( UserGroupInformation.getCurrentUser().getUserName()); UserGroupInformation.getCurrentUser().addToken(token); } Instead of checking if timeline service is enabled, we should be checking if application history server is enabled. Why ?
          Hide
          hadoopqa Hadoop QA added a comment -



          +1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 18m 58s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 tests included 0m 0s The patch appears to include 1 new or modified test files.
          +1 javac 9m 47s There were no new javac warning messages.
          +1 javadoc 11m 23s There were no new javadoc warning messages.
          +1 release audit 0m 24s The applied patch does not increase the total number of release audit warnings.
          +1 checkstyle 0m 55s There were no new checkstyle issues.
          +1 whitespace 0m 0s The patch has no lines that end in whitespace.
          +1 install 1m 38s mvn install still works.
          +1 eclipse:eclipse 0m 37s The patch built with eclipse:eclipse.
          +1 findbugs 1m 37s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
          +1 yarn tests 63m 12s Tests passed in hadoop-yarn-server-resourcemanager.
              108m 36s  



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12761137/YARN-4183.1.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / 3f42753
          hadoop-yarn-server-resourcemanager test log https://builds.apache.org/job/PreCommit-YARN-Build/9211/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/9211/testReport/
          Java 1.7.0_55
          uname Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/9211/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 pre-patch 18m 58s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 1 new or modified test files. +1 javac 9m 47s There were no new javac warning messages. +1 javadoc 11m 23s There were no new javadoc warning messages. +1 release audit 0m 24s The applied patch does not increase the total number of release audit warnings. +1 checkstyle 0m 55s There were no new checkstyle issues. +1 whitespace 0m 0s The patch has no lines that end in whitespace. +1 install 1m 38s mvn install still works. +1 eclipse:eclipse 0m 37s The patch built with eclipse:eclipse. +1 findbugs 1m 37s The patch does not introduce any new Findbugs (version 3.0.0) warnings. +1 yarn tests 63m 12s Tests passed in hadoop-yarn-server-resourcemanager.     108m 36s   Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12761137/YARN-4183.1.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / 3f42753 hadoop-yarn-server-resourcemanager test log https://builds.apache.org/job/PreCommit-YARN-Build/9211/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/9211/testReport/ Java 1.7.0_55 uname Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-YARN-Build/9211/console This message was automatically generated.
          Hide
          mitdesai Mit Desai added a comment -

          Attaching the patch

          Show
          mitdesai Mit Desai added a comment - Attaching the patch

            People

            • Assignee:
              Naganarasimha Naganarasimha G R
              Reporter:
              mitdesai Mit Desai
            • Votes:
              0 Vote for this issue
              Watchers:
              15 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development