Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Won't Fix
    • 2.6.0, 2.7.1
    • None
    • None
    • RM HA with ATS

    Description

      1.Start RM with HA and ATS configured and run some yarn applications
      2.Once applications are finished sucessfully start timeline server
      3.Now failover HA form active to standby or restart the node

      ATS events for the applications already existing in ATS are resent which is not required.

      Attachments

        1. YARN-3127.20150213-1.patch
          8 kB
          Naganarasimha G R
        2. YARN-3127.20150329-1.patch
          11 kB
          Naganarasimha G R
        3. AppTransition.png
          182 kB
          Naganarasimha G R
        4. YARN-3127.20150624-1.patch
          15 kB
          Naganarasimha G R
        5. YARN-3127.20151123-1.patch
          15 kB
          Naganarasimha G R

        Issue Links

        Activity

          In org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.AttemptRecoveredTransition

          if (rmApp.getCurrentAppAttempt() == appAttempt
          && !RMAppImpl.isAppInFinalState(rmApp))

          Unknown macro: { // Add the previous finished attempt to scheduler synchronously so // that scheduler knows the previous attempt. appAttempt.scheduler.handle(new AppAttemptAddedSchedulerEvent( appAttempt.getAppAttemptId(), false, true)); (new BaseFinalTransition(appAttempt.recoveredFinalState)).transition( appAttempt, event); }

          RMAppImpl.isAppInFinalState returns true hence the transition which publishes the attempt during recovery to ATS is not played.
          So one option is to move BaseFinalTransition.transition outside this if block.
          But other query which i have is, During recovery whether is it required to publish events to ATS ?

          Naganarasimha Naganarasimha G R added a comment - In org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.AttemptRecoveredTransition if (rmApp.getCurrentAppAttempt() == appAttempt && !RMAppImpl.isAppInFinalState(rmApp)) Unknown macro: { // Add the previous finished attempt to scheduler synchronously so // that scheduler knows the previous attempt. appAttempt.scheduler.handle(new AppAttemptAddedSchedulerEvent( appAttempt.getAppAttemptId(), false, true)); (new BaseFinalTransition(appAttempt.recoveredFinalState)).transition( appAttempt, event); } RMAppImpl.isAppInFinalState returns true hence the transition which publishes the attempt during recovery to ATS is not played. So one option is to move BaseFinalTransition.transition outside this if block. But other query which i have is, During recovery whether is it required to publish events to ATS ?

          This case is the same as the Timeline service starting long (days) after the application has finished. I think it is better not to store these events during recovery and instead simply report an error on the UI saying that the Timeline service doesn't know about this application.

          vinodkv Vinod Kumar Vavilapalli added a comment - This case is the same as the Timeline service starting long (days) after the application has finished. I think it is better not to store these events during recovery and instead simply report an error on the UI saying that the Timeline service doesn't know about this application.
          bibinchundatt Bibin Chundatt added a comment -

          Thanks a lot Vinod and Naga for looking into the issue.
          Vinod do you suggest the below part also should be gracefully handled (attempt details not available) and not publish event to ATS during recovery?

          org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore

          public ApplicationAttemptReport getApplicationAttempt(
          ApplicationAttemptId appAttemptId) throws YarnException, IOException {
          ApplicationReportExt app = getApplication(
          appAttemptId.getApplicationId(), ApplicationReportField.USER_AND_ACLS);
          checkAccess(app);
          TimelineEntity entity = timelineDataManager.getEntity(
          AppAttemptMetricsConstants.ENTITY_TYPE,
          appAttemptId.toString(), EnumSet.allOf(Field.class),
          UserGroupInformation.getLoginUser());
          if (entity == null)

          Unknown macro: { throw new ApplicationAttemptNotFoundException( "The entity for application attempt " + appAttemptId + " doesn't exist in the timeline store"); }

          else

          Unknown macro: { return convertToApplicationAttemptReport(entity); }

          }

          Please do correct me if i am wrong.

          bibinchundatt Bibin Chundatt added a comment - Thanks a lot Vinod and Naga for looking into the issue. Vinod do you suggest the below part also should be gracefully handled (attempt details not available) and not publish event to ATS during recovery? org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore public ApplicationAttemptReport getApplicationAttempt( ApplicationAttemptId appAttemptId) throws YarnException, IOException { ApplicationReportExt app = getApplication( appAttemptId.getApplicationId(), ApplicationReportField.USER_AND_ACLS); checkAccess(app); TimelineEntity entity = timelineDataManager.getEntity( AppAttemptMetricsConstants.ENTITY_TYPE, appAttemptId.toString(), EnumSet.allOf(Field.class), UserGroupInformation.getLoginUser()); if (entity == null) Unknown macro: { throw new ApplicationAttemptNotFoundException( "The entity for application attempt " + appAttemptId + " doesn't exist in the timeline store"); } else Unknown macro: { return convertToApplicationAttemptReport(entity); } } Please do correct me if i am wrong.

          Attaching initial patch to avoid events sent to System metrics publisher during RM application recovery from state store

          Naganarasimha Naganarasimha G R added a comment - Attaching initial patch to avoid events sent to System metrics publisher during RM application recovery from state store
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12698538/YARN-3127.20150213-1.patch
          against trunk revision 6f5290b.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 5 new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6624//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6624//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6624//console

          This message is automatically generated.

          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12698538/YARN-3127.20150213-1.patch against trunk revision 6f5290b. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 5 new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6624//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6624//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6624//console This message is automatically generated.

          None of the finds bugs issue reports are related to the code changes which is present in the patch.

          Naganarasimha Naganarasimha G R added a comment - None of the finds bugs issue reports are related to the code changes which is present in the patch.

          Hi Zhijie Shen, Vinod Kumar Vavilapalli & Xuan Gong,
          Can any one of you review this jira , please

          Naganarasimha Naganarasimha G R added a comment - Hi Zhijie Shen , Vinod Kumar Vavilapalli & Xuan Gong , Can any one of you review this jira , please
          ozawa Tsuyoshi Ozawa added a comment -

          Naganarasimha G R Thank you for taking this issue! The policy of fix looks good to me. Could you add a test case to TestRMRestart to cover the case?

          Also, can we preserve following test cases?

          -    verify(writer).applicationStarted(any(RMApp.class));
          -    verify(publisher).appCreated(any(RMApp.class), anyLong());
          
          ozawa Tsuyoshi Ozawa added a comment - Naganarasimha G R Thank you for taking this issue! The policy of fix looks good to me. Could you add a test case to TestRMRestart to cover the case? Also, can we preserve following test cases? - verify(writer).applicationStarted(any(RMApp.class)); - verify(publisher).appCreated(any(RMApp.class), anyLong());

          Thanks Tsuyoshi Ozawa for the review and sorry for the delay in the response as i was held up in other issues ...

          Could you add a test case to TestRMRestart to cover the case?

          Have taken care in this updated patch

          can we preserve following test cases?

          As there are changes in the transitions if these methods were there, then TestRMAppTransitions was failing for multiple testcases. Approach adopted to fix this issue is : earlier SystemMetricPublisher.appCreated() was invoked during creation of RMAppImpl itself and also SystemMetricPublisher.ACLsUpdated was invoked in RMAppManager.createAndPopulateNewRMApp which was common to both recover and new application execution flow, so I have removed from the above mentioned places and placed it in RMAppManager.publishSystemMetrics thus ensuring that only during new application execution flow these updates are sent to SystemMetricPublisher

          Naganarasimha Naganarasimha G R added a comment - Thanks Tsuyoshi Ozawa for the review and sorry for the delay in the response as i was held up in other issues ... Could you add a test case to TestRMRestart to cover the case? Have taken care in this updated patch can we preserve following test cases? As there are changes in the transitions if these methods were there, then TestRMAppTransitions was failing for multiple testcases. Approach adopted to fix this issue is : earlier SystemMetricPublisher.appCreated() was invoked during creation of RMAppImpl itself and also SystemMetricPublisher.ACLsUpdated was invoked in RMAppManager.createAndPopulateNewRMApp which was common to both recover and new application execution flow, so I have removed from the above mentioned places and placed it in RMAppManager.publishSystemMetrics thus ensuring that only during new application execution flow these updates are sent to SystemMetricPublisher
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12708037/YARN-3127.20150329-1.patch
          against trunk revision 3d9132d.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

          org.apache.hadoop.yarn.server.resourcemanager.TestRMHA
          org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService
          org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebappAuthentication
          org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler
          org.apache.hadoop.yarn.server.resourcemanager.TestMoveApplication
          org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStore

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7141//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7141//console

          This message is automatically generated.

          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12708037/YARN-3127.20150329-1.patch against trunk revision 3d9132d. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 3 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestRMHA org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebappAuthentication org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler org.apache.hadoop.yarn.server.resourcemanager.TestMoveApplication org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStore Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7141//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7141//console This message is automatically generated.

          Hi Tsuyoshi Ozawa,
          Seems like the test cases are mostly failing to the bug which has been induced in HADOOP-10670 and its taken care in HADOOP-11754, and some cases are failing due bind exception but not sure they are related to the changes done in the patch guessing might be the impacts of HADOOP-10670 itself (also earlier almost the same patch all test cases passed and for the new test case not starting new RM as such, so less likely related to these changes). In general you can review the patch and will trigger jenkins once HADOOP-11754 is in.

          Naganarasimha Naganarasimha G R added a comment - Hi Tsuyoshi Ozawa , Seems like the test cases are mostly failing to the bug which has been induced in HADOOP-10670 and its taken care in HADOOP-11754 , and some cases are failing due bind exception but not sure they are related to the changes done in the patch guessing might be the impacts of HADOOP-10670 itself (also earlier almost the same patch all test cases passed and for the new test case not starting new RM as such, so less likely related to these changes). In general you can review the patch and will trigger jenkins once HADOOP-11754 is in.

          HADOOP-11754 has been fixed hence uploading the same patch to check for test case failures

          Naganarasimha Naganarasimha G R added a comment - HADOOP-11754 has been fixed hence uploading the same patch to check for test case failures
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12708571/YARN-3127.20150329-1.patch
          against trunk revision 2daa478.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

          org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7183//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7183//console

          This message is automatically generated.

          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12708571/YARN-3127.20150329-1.patch against trunk revision 2daa478. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 3 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7183//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7183//console This message is automatically generated.

          TestFairScheduler is not related to this issue and is getting fixed as part of YARN-2666. Tsuyoshi Ozawa, can you please take a look at this now ?

          Naganarasimha Naganarasimha G R added a comment - TestFairScheduler is not related to this issue and is getting fixed as part of YARN-2666 . Tsuyoshi Ozawa , can you please take a look at this now ?

          Hi Xuan Gong, If you have the bandwidth can you take a look at this patch too ?

          Naganarasimha Naganarasimha G R added a comment - Hi Xuan Gong , If you have the bandwidth can you take a look at this patch too ?
          xgong Xuan Gong added a comment -

          Naganarasimha G R Thanks for working on this. I will take a look shortly.

          xgong Xuan Gong added a comment - Naganarasimha G R Thanks for working on this. I will take a look shortly.
          xgong Xuan Gong added a comment -

          Naganarasimha G R Sorry for the late reply.
          So, the solution here is to avoid events sent to System metrics publisher during RM application recovery from state store. It looks fine to solve the current issue.

          But here is the case I am thinking right now might not work:

          • we start RM, ATS correctly
          • the RM failover/restart happens between the transition from FINAL_SAVING to FINISHED
          • based on the original code, when we do the recovery for the applications, we will send out appFinished event to System metrics publisher to update the app status in ATS
          • but based on the patch, we will not do it. In this case, the ATS will never get the app status update(change the app status from start to finished) ? This looks like an issue which is broken by the patch.

          Did I miss anything ?

          xgong Xuan Gong added a comment - Naganarasimha G R Sorry for the late reply. So, the solution here is to avoid events sent to System metrics publisher during RM application recovery from state store. It looks fine to solve the current issue. But here is the case I am thinking right now might not work: we start RM, ATS correctly the RM failover/restart happens between the transition from FINAL_SAVING to FINISHED based on the original code, when we do the recovery for the applications, we will send out appFinished event to System metrics publisher to update the app status in ATS but based on the patch, we will not do it. In this case, the ATS will never get the app status update(change the app status from start to finished) ? This looks like an issue which is broken by the patch. Did I miss anything ?

          Thanks for the review Xuan Gong, Good catch... Well i am not sure why State is saved statestore in the the FINAL_SAVING, can we move it to FinalTransition, i.e. we can do it after we publish the event to the publisher then we can store the state in RMStatestore or vice versa ?
          Also IMO depending on Where the RM will failover(killed/stopped) there can be chances that Entities are published to ATS after fail over. so would it be good to handle in the ATS side such that URL crash doesn't happen?

          Naganarasimha Naganarasimha G R added a comment - Thanks for the review Xuan Gong , Good catch... Well i am not sure why State is saved statestore in the the FINAL_SAVING, can we move it to FinalTransition, i.e. we can do it after we publish the event to the publisher then we can store the state in RMStatestore or vice versa ? Also IMO depending on Where the RM will failover(killed/stopped) there can be chances that Entities are published to ATS after fail over. so would it be good to handle in the ATS side such that URL crash doesn't happen?
          gtcarrera9 Li Lu added a comment -

          Hi Naganarasimha G R, any updates on this JIRA? As pointed out by Xuan Gong, the current solution seems to have some problems. Therefore, I'm canceling this patch for now. Thanks!

          gtcarrera9 Li Lu added a comment - Hi Naganarasimha G R , any updates on this JIRA? As pointed out by Xuan Gong , the current solution seems to have some problems. Therefore, I'm canceling this patch for now. Thanks!

          Thanks for reviewing Li Lu,
          Issue mentioned over here main cause is already addressed in another jira by Xuan Gong and but when we test in this way we still get to see null in the webui and also more importantly this jira addressing is required as events are published for every app (start and finished) on RM failover. So if 10000 apps are maintained then so many additional non required events are getting triggered. this we need to address. And for the issue pointed by Xuan Gong, i had asked for suggestion of approach being taken and hence waiting for it, AFAIK we need to ensure first ATS events are sent and then store the final application state to RMstate store in FINAL_SAVING transition (and also other possible cases where app is created and will be killed b4 attempt is created in which case FINAL_SAVING is not called). If this approach is fine then will update the patch and the description.

          Naganarasimha Naganarasimha G R added a comment - Thanks for reviewing Li Lu , Issue mentioned over here main cause is already addressed in another jira by Xuan Gong and but when we test in this way we still get to see null in the webui and also more importantly this jira addressing is required as events are published for every app (start and finished) on RM failover. So if 10000 apps are maintained then so many additional non required events are getting triggered. this we need to address. And for the issue pointed by Xuan Gong , i had asked for suggestion of approach being taken and hence waiting for it, AFAIK we need to ensure first ATS events are sent and then store the final application state to RMstate store in FINAL_SAVING transition (and also other possible cases where app is created and will be killed b4 attempt is created in which case FINAL_SAVING is not called). If this approach is fine then will update the patch and the description.
          xgong Xuan Gong added a comment - Link https://issues.apache.org/jira/browse/YARN-3701 .

          Thanks Xuan Gong for pointing out this jira,
          I was actually planning to change the description and scope of this jira. As what i am trying to solve here is stop unwanted timeline events which gets triggered on RM failover. If we observe even for the finished apps we get most of timeline events generated currently which is not needed as RM by default stores around 10000 apps. Also when i recently test the scenario mentioned above seems like it is already been corrected (before YARN-3701) but some nulls are displayed in the webui. Little Held up, will try to modify the jira and rework on the patch based on your comments (handling sending of events on finished) asap.

          Naganarasimha Naganarasimha G R added a comment - Thanks Xuan Gong for pointing out this jira, I was actually planning to change the description and scope of this jira. As what i am trying to solve here is stop unwanted timeline events which gets triggered on RM failover. If we observe even for the finished apps we get most of timeline events generated currently which is not needed as RM by default stores around 10000 apps. Also when i recently test the scenario mentioned above seems like it is already been corrected (before YARN-3701 ) but some nulls are displayed in the webui. Little Held up, will try to modify the jira and rework on the patch based on your comments (handling sending of events on finished) asap.

          Hi Xuan Gong,
          I have modified the patch to work for the scenario you mentioned but in best effort basis it will try to avoid duplicated publish, such that events are published b4 saving it to statestore (failover happens after publishing and b4 saving to state store might result in multiple events published).
          Based on state transition diagram, All the events are going through the final_saving state except for
          New -> Finished (on RECOVER event)
          New -> Failed (on RECOVER event)
          New -> Killed (on KILL,RECOVER event)
          Killing -> Finished (on ATTEMPT_FINSHED event)
          running -> Finished (on ATTEMPT_FINSHED event)

          first 2, No need to handle as the state would be published ATS b4 recovery.
          for the 3rd one when Application is killed from New state then we need to explicitly publish
          and also the last 2 state transitions needs to be handled which doesn't go through final_saving state.
          Please review...

          Naganarasimha Naganarasimha G R added a comment - Hi Xuan Gong , I have modified the patch to work for the scenario you mentioned but in best effort basis it will try to avoid duplicated publish, such that events are published b4 saving it to statestore (failover happens after publishing and b4 saving to state store might result in multiple events published). Based on state transition diagram, All the events are going through the final_saving state except for New -> Finished (on RECOVER event) New -> Failed (on RECOVER event) New -> Killed (on KILL,RECOVER event) Killing -> Finished (on ATTEMPT_FINSHED event) running -> Finished (on ATTEMPT_FINSHED event) first 2, No need to handle as the state would be published ATS b4 recovery. for the 3rd one when Application is killed from New state then we need to explicitly publish and also the last 2 state transitions needs to be handled which doesn't go through final_saving state. Please review...
          hadoopqa Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 16m 11s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 tests included 0m 0s The patch appears to include 3 new or modified test files.
          +1 javac 7m 47s There were no new javac warning messages.
          +1 javadoc 9m 39s There were no new javadoc warning messages.
          +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings.
          -1 checkstyle 0m 49s The applied patch generated 1 new checkstyle issues (total was 150, now 151).
          -1 whitespace 0m 1s The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix.
          +1 install 1m 20s mvn install still works.
          +1 eclipse:eclipse 0m 32s The patch built with eclipse:eclipse.
          +1 findbugs 1m 26s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
          +1 yarn tests 51m 55s Tests passed in hadoop-yarn-server-resourcemanager.
              90m 6s  



          This message was automatically generated.

          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 pre-patch 16m 11s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 3 new or modified test files. +1 javac 7m 47s There were no new javac warning messages. +1 javadoc 9m 39s There were no new javadoc warning messages. +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings. -1 checkstyle 0m 49s The applied patch generated 1 new checkstyle issues (total was 150, now 151). -1 whitespace 0m 1s The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. +1 install 1m 20s mvn install still works. +1 eclipse:eclipse 0m 32s The patch built with eclipse:eclipse. +1 findbugs 1m 26s The patch does not introduce any new Findbugs (version 3.0.0) warnings. +1 yarn tests 51m 55s Tests passed in hadoop-yarn-server-resourcemanager.     90m 6s   Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12747076/YARN-3127.20150624-1.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / f8f6091 checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/8653/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt whitespace https://builds.apache.org/job/PreCommit-YARN-Build/8653/artifact/patchprocess/whitespace.txt hadoop-yarn-server-resourcemanager test log https://builds.apache.org/job/PreCommit-YARN-Build/8653/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/8653/testReport/ Java 1.7.0_55 uname Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-YARN-Build/8653/console This message was automatically generated.

          Hi Xuan Gong, Li Lu (inactive) & Tsuyoshi Ozawa,
          Can any one you have a look at this jira ?

          Naganarasimha Naganarasimha G R added a comment - Hi Xuan Gong , Li Lu (inactive) & Tsuyoshi Ozawa , Can any one you have a look at this jira ?

          Hi Xuan Gong/ Tsuyoshi Ozawa,
          I feel the issue is valid and needs to be fixed, if one of you guys can take a look at the approach and the patch i mentioned earlier it would be helpful to get this jira moving.

          Naganarasimha Naganarasimha G R added a comment - Hi Xuan Gong / Tsuyoshi Ozawa , I feel the issue is valid and needs to be fixed, if one of you guys can take a look at the approach and the patch i mentioned earlier it would be helpful to get this jira moving.

          Hi Sangjin Lee, Rohith Sharma K S & Xuan Gong, i have rebased the patch can you please take a look at it. Based on this we can get YARN-4350 corrected.

          Naganarasimha Naganarasimha G R added a comment - Hi Sangjin Lee , Rohith Sharma K S & Xuan Gong , i have rebased the patch can you please take a look at it. Based on this we can get YARN-4350 corrected.
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 0s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 3 new or modified test files.
          +1 mvninstall 7m 48s trunk passed
          +1 compile 0m 29s trunk passed with JDK v1.8.0_66
          +1 compile 0m 32s trunk passed with JDK v1.7.0_85
          +1 checkstyle 0m 14s trunk passed
          +1 mvnsite 0m 38s trunk passed
          +1 mvneclipse 0m 15s trunk passed
          +1 findbugs 1m 15s trunk passed
          +1 javadoc 0m 22s trunk passed with JDK v1.8.0_66
          +1 javadoc 0m 29s trunk passed with JDK v1.7.0_85
          +1 mvninstall 0m 35s the patch passed
          +1 compile 0m 29s the patch passed with JDK v1.8.0_66
          +1 javac 0m 29s the patch passed
          +1 compile 0m 32s the patch passed with JDK v1.7.0_85
          -1 javac 3m 43s hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_85 with JDK v1.7.0_85 generated 1 new issues (was 2, now 2).
          +1 javac 0m 32s the patch passed
          -1 checkstyle 0m 14s Patch generated 1 new checkstyle issues in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager (total was 147, now 148).
          +1 mvnsite 0m 38s the patch passed
          +1 mvneclipse 0m 16s the patch passed
          +1 whitespace 0m 0s Patch has no whitespace issues.
          +1 findbugs 1m 23s the patch passed
          +1 javadoc 0m 23s the patch passed with JDK v1.8.0_66
          +1 javadoc 0m 28s the patch passed with JDK v1.7.0_85
          -1 unit 59m 15s hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66.
          -1 unit 60m 26s hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_85.
          +1 asflicense 0m 23s Patch does not generate ASF License warnings.
          138m 10s



          Reason Tests
          JDK v1.8.0_66 Failed junit tests hadoop.yarn.server.resourcemanager.TestAMAuthorization
            hadoop.yarn.server.resourcemanager.TestClientRMTokens
          JDK v1.7.0_85 Failed junit tests hadoop.yarn.server.resourcemanager.TestAMAuthorization
            hadoop.yarn.server.resourcemanager.TestClientRMTokens



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:0ca8df7
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12773836/YARN-3127.20151123-1.patch
          JIRA Issue YARN-3127
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux fb3380a093d6 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 201f14e
          findbugs v3.0.0
          javac hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_85: https://builds.apache.org/job/PreCommit-YARN-Build/9766/artifact/patchprocess/diff-compile-javac-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_85.txt
          checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/9766/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
          unit https://builds.apache.org/job/PreCommit-YARN-Build/9766/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_66.txt
          unit https://builds.apache.org/job/PreCommit-YARN-Build/9766/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_85.txt
          unit test logs https://builds.apache.org/job/PreCommit-YARN-Build/9766/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_66.txt https://builds.apache.org/job/PreCommit-YARN-Build/9766/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_85.txt
          JDK v1.7.0_85 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/9766/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
          Max memory used 75MB
          Powered by Apache Yetus http://yetus.apache.org
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/9766/console

          This message was automatically generated.

          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 3 new or modified test files. +1 mvninstall 7m 48s trunk passed +1 compile 0m 29s trunk passed with JDK v1.8.0_66 +1 compile 0m 32s trunk passed with JDK v1.7.0_85 +1 checkstyle 0m 14s trunk passed +1 mvnsite 0m 38s trunk passed +1 mvneclipse 0m 15s trunk passed +1 findbugs 1m 15s trunk passed +1 javadoc 0m 22s trunk passed with JDK v1.8.0_66 +1 javadoc 0m 29s trunk passed with JDK v1.7.0_85 +1 mvninstall 0m 35s the patch passed +1 compile 0m 29s the patch passed with JDK v1.8.0_66 +1 javac 0m 29s the patch passed +1 compile 0m 32s the patch passed with JDK v1.7.0_85 -1 javac 3m 43s hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_85 with JDK v1.7.0_85 generated 1 new issues (was 2, now 2). +1 javac 0m 32s the patch passed -1 checkstyle 0m 14s Patch generated 1 new checkstyle issues in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager (total was 147, now 148). +1 mvnsite 0m 38s the patch passed +1 mvneclipse 0m 16s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 findbugs 1m 23s the patch passed +1 javadoc 0m 23s the patch passed with JDK v1.8.0_66 +1 javadoc 0m 28s the patch passed with JDK v1.7.0_85 -1 unit 59m 15s hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. -1 unit 60m 26s hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_85. +1 asflicense 0m 23s Patch does not generate ASF License warnings. 138m 10s Reason Tests JDK v1.8.0_66 Failed junit tests hadoop.yarn.server.resourcemanager.TestAMAuthorization   hadoop.yarn.server.resourcemanager.TestClientRMTokens JDK v1.7.0_85 Failed junit tests hadoop.yarn.server.resourcemanager.TestAMAuthorization   hadoop.yarn.server.resourcemanager.TestClientRMTokens Subsystem Report/Notes Docker Image:yetus/hadoop:0ca8df7 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12773836/YARN-3127.20151123-1.patch JIRA Issue YARN-3127 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux fb3380a093d6 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 201f14e findbugs v3.0.0 javac hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_85: https://builds.apache.org/job/PreCommit-YARN-Build/9766/artifact/patchprocess/diff-compile-javac-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_85.txt checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/9766/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt unit https://builds.apache.org/job/PreCommit-YARN-Build/9766/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_66.txt unit https://builds.apache.org/job/PreCommit-YARN-Build/9766/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_85.txt unit test logs https://builds.apache.org/job/PreCommit-YARN-Build/9766/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_66.txt https://builds.apache.org/job/PreCommit-YARN-Build/9766/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_85.txt JDK v1.7.0_85 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/9766/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager Max memory used 75MB Powered by Apache Yetus http://yetus.apache.org Console output https://builds.apache.org/job/PreCommit-YARN-Build/9766/console This message was automatically generated.

          Seems like test failures are unrelated to the fix and check style is not valid.

          Naganarasimha Naganarasimha G R added a comment - Seems like test failures are unrelated to the fix and check style is not valid.

          YARN-4306 and YARN-4318 have been already raised for the test failures

          Naganarasimha Naganarasimha G R added a comment - YARN-4306 and YARN-4318 have been already raised for the test failures

          Closing this jira based on the discussion in YARN-4392, Conclusion is resending the events during recovery is ok, as there is probability that ATS events are not yet dispatched but RM fails over. But we need to just ensure that the data like event time should not be altered when resending events

          Naganarasimha Naganarasimha G R added a comment - Closing this jira based on the discussion in YARN-4392 , Conclusion is resending the events during recovery is ok, as there is probability that ATS events are not yet dispatched but RM fails over. But we need to just ensure that the data like event time should not be altered when resending events

          People

            Naganarasimha Naganarasimha G R Assign to me
            bibinchundatt Bibin Chundatt
            Votes:
            0 Vote for this issue
            Watchers:
            Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                In order to see discussions, first confirm access to your Slack account(s) in the following workspace(s): ASF