Details

    • Type: Sub-task Sub-task
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Duplicate
    • Affects Version/s: None
    • Fix Version/s: 3.0.0
    • Component/s: None
    • Labels:

      Description

      Review the yarn common services (CompositeService, AbstractLivelinessMonitor and make their service startup and especially shutdown more robust against out-of-lifecycle invocation and partially complete initialization.

      Write tests for these where possible.

      1. MAPREDUCE-4014.patch
        22 kB
        Steve Loughran

        Issue Links

          Activity

          Hide
          Steve Loughran added a comment -

          Superceded by YARN-530

          Show
          Steve Loughran added a comment - Superceded by YARN-530
          Hide
          Siddharth Seth added a comment -

          Steve, apologies for the delayed feedback. I was wondering which parts of MAPREDUCE-3939 are planned for this patch.
          "Static methods to choreograph of lifecycle operations" seems to be covered.
          Will the others, specifically "AbstractService doesn't prevent duplicate state change requests", "AbstractService state change doesn't defend against race conditions", "state model prevents stopped state being entered if you could not successfully start the service" be separate patches? Definitely looks like they should go in.

          state model prevents stopped state being entered if you could not successfully start the service.

          I don't believe resources which require an explicit release are meant to be obtained in the init() stage - but that may not always be the case. I'd agree with allowing stop() from any state, as well as a CompositeService attempting to stop all child services when told to stop() (instead of just the ones which have started).
          Currently, a failed start() on a composite service will stop() services which had started and move them to STOPPED state, attempt to stop() the failed service but leave it in INITED state, and leave remaining services in INITED state - which doesn't seem correct.

          Feedback on this patch

          • interruptAndJoinThread(Thread target) - could have a joinTimeout parameter as well.
          • stopIPCServer, stopWebApp, interrupt* - should these be in AbstractService ? or a separate helper class.
          • toString() in AbstractService - the text is missing a closing quote.
          Show
          Siddharth Seth added a comment - Steve, apologies for the delayed feedback. I was wondering which parts of MAPREDUCE-3939 are planned for this patch. "Static methods to choreograph of lifecycle operations" seems to be covered. Will the others, specifically "AbstractService doesn't prevent duplicate state change requests", "AbstractService state change doesn't defend against race conditions", "state model prevents stopped state being entered if you could not successfully start the service" be separate patches? Definitely looks like they should go in. state model prevents stopped state being entered if you could not successfully start the service. I don't believe resources which require an explicit release are meant to be obtained in the init() stage - but that may not always be the case. I'd agree with allowing stop() from any state, as well as a CompositeService attempting to stop all child services when told to stop() (instead of just the ones which have started). Currently, a failed start() on a composite service will stop() services which had started and move them to STOPPED state, attempt to stop() the failed service but leave it in INITED state, and leave remaining services in INITED state - which doesn't seem correct. Feedback on this patch interruptAndJoinThread(Thread target) - could have a joinTimeout parameter as well. stopIPCServer, stopWebApp, interrupt* - should these be in AbstractService ? or a separate helper class. toString() in AbstractService - the text is missing a closing quote.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12518548/MAPREDUCE-4014.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 8 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.mapred.TestWritableJobConf

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2061//testReport/
          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2061//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12518548/MAPREDUCE-4014.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 8 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.mapred.TestWritableJobConf +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2061//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2061//console This message is automatically generated.
          Hide
          Steve Loughran added a comment -

          One of the tests in MAPREDUCE-4014 also demonstrates the current sharing of configurations -setting a property on the root service sets it in all the children. While this could be a feature, it could leak configuration between children.

              conf.set("t2", "should not be shared");
              assertServiceConfigurationContains(svc, "t2");
              assertServiceConfigurationContains(firstChild, "t2");
              assertServiceConfigurationContains(lastChild, "t2");
          
          Show
          Steve Loughran added a comment - One of the tests in MAPREDUCE-4014 also demonstrates the current sharing of configurations -setting a property on the root service sets it in all the children. While this could be a feature, it could leak configuration between children. conf.set( "t2" , "should not be shared" ); assertServiceConfigurationContains(svc, "t2" ); assertServiceConfigurationContains(firstChild, "t2" ); assertServiceConfigurationContains(lastChild, "t2" );
          Hide
          Steve Loughran added a comment -

          I've looked at the code and written the tests, the CompositeService seems good apart from MAPREDUCE-4016, where I need to verify that sharing of configurations is not a feature people actually wanted. If MAPREDUCE-4015 was included -a YarnException subclass specifically for illegal state changes, CompositeService.init() could be extended to relay this exception up, rather than wrap it.

          The patch I'm submitting has merged in the AbstractService changes to aid making service stop operations robust regardless of service state; these were taken from MAPREDUCE-3502. When committed, and MAPREDUCE-4015/MAPREDUCE-4016 resolved, this should complete the yarn-common changes and then specific services can be reviewed.

          Show
          Steve Loughran added a comment - I've looked at the code and written the tests, the CompositeService seems good apart from MAPREDUCE-4016 , where I need to verify that sharing of configurations is not a feature people actually wanted. If MAPREDUCE-4015 was included -a YarnException subclass specifically for illegal state changes, CompositeService.init() could be extended to relay this exception up, rather than wrap it. The patch I'm submitting has merged in the AbstractService changes to aid making service stop operations robust regardless of service state; these were taken from MAPREDUCE-3502 . When committed, and MAPREDUCE-4015 / MAPREDUCE-4016 resolved, this should complete the yarn-common changes and then specific services can be reviewed.
          Hide
          Steve Loughran added a comment -

          I'm going to submit an initial patch soon, but want to flag up something first. Should CompositeService attempt to stop() all its child services when told to stop() itself - regardless of what state the it is in? This would be consistent with a philosophy of "support the stop operation regardless of your state".

          Without that you can't reliably clean up during failed start operations -which the CompositeService itself tries to do by stopping all child services if any one of them fails to start - including the service that failed.

          Show
          Steve Loughran added a comment - I'm going to submit an initial patch soon, but want to flag up something first. Should CompositeService attempt to stop() all its child services when told to stop() itself - regardless of what state the it is in? This would be consistent with a philosophy of "support the stop operation regardless of your state". Without that you can't reliably clean up during failed start operations -which the CompositeService itself tries to do by stopping all child services if any one of them fails to start - including the service that failed.

            People

            • Assignee:
              Steve Loughran
              Reporter:
              Steve Loughran
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development