Uploaded image for project: 'Apache Tez'
  1. Apache Tez
  2. TEZ-4561

Improve reported exception when DAGAppMaster is shutting down

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.10.5
    • None
    • None

    Description

      https://github.com/apache/tez/blob/66a6ca64b5edde0d30bea0962cb132f3c4982469/tez-dag/src/main/java/org/apache/tez/dag/app/DAGAppMaster.java#L1683

      the AM can return this exception during a shutdown like below:

      TezUncheckedException: Cannot get ApplicationACLs before all services have started
         at org.apache.tez.dag.app.DAGAppMaster$RunningAppContext.getApplicationACLs(DAGAppMaster.java:1733)
         at org.apache.tez.dag.app.rm.container.AMContainerImpl$LaunchRequestTransition.transition(AMContainerImpl.java:513)
         at org.apache.tez.dag.app.rm.container.AMContainerImpl$LaunchRequestTransition.transition(AMContainerImpl.java:470)
         at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
         at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
         at org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
         at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:493)
         at org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:64)
         at org.apache.tez.dag.app.rm.container.AMContainerImpl.handle(AMContainerImpl.java:441)
         at org.apache.tez.dag.app.rm.container.AMContainerImpl.handle(AMContainerImpl.java:78)
         at org.apache.tez.dag.app.rm.container.AMContainerMap.handle(AMContainerMap.java:68)
         at org.apache.tez.dag.app.rm.container.AMContainerMap.handle(AMContainerMap.java:40)
         at org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:200)
         at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:118)
         at java.base/java.lang.Thread.run(Thread.java:829)\r
      

      which is confusing, and doesn't make the log reader aware that getServiceState() != STATE.STARTED is not an initialization problem (especially confusing in case of an AM which is already running for a long time), instead STATE.STOPPED

      we should check that and report (maybe even with a timestamp when the shutdownhook was started)

      Attachments

        Issue Links

          Activity

            People

              ayushtkn Ayush Saxena
              abstractdog László Bodor
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 2h 20m
                  2h 20m