Uploaded image for project: 'Apache Tez'
  1. Apache Tez
  2. TEZ-4488

TaskSchedulerManager might not be initialized when the first DAG comes

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.10.3
    • None
    • None

    Description

      query-coordinator <11>1 2023-04-03T12:54:12.056Z query-coordinator-0-0 query-coordinator 1 10ea11e4-d4dc-4231-878e-0c8c07eda53b [mdc@18060 class="impl.DAGImpl" level="ERROR" thread="IPC Server handler 1 on 22222"] Uncaught Exception when handling event DAG_INIT on Dag dag_1680526446742_0000_1 at currentState=NEW
      java.lang.NullPointerException
          at org.apache.tez.dag.app.rm.TaskSchedulerManager.getTaskSchedulerClassName(TaskSchedulerManager.java:1082)
          at org.apache.tez.dag.app.DAGAppMaster$RunningAppContext.getTaskSchedulerClassName(DAGAppMaster.java:1702)
          at org.apache.tez.dag.app.dag.impl.VertexImpl.<init>(VertexImpl.java:1061)
          at org.apache.tez.dag.app.dag.impl.DAGImpl.createVertex(DAGImpl.java:1741)
          at org.apache.tez.dag.app.dag.impl.DAGImpl.initializeDAG(DAGImpl.java:1596)
          at org.apache.tez.dag.app.dag.impl.DAGImpl$InitTransition.transition(DAGImpl.java:1869)
          at org.apache.tez.dag.app.dag.impl.DAGImpl$InitTransition.transition(DAGImpl.java:1846)
          at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
          at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
          at org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
          at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
          at org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:59)
          at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:1219)
          at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:158)
          at org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:2231)
          at org.apache.tez.dag.app.DAGAppMaster.startDAGExecution(DAGAppMaster.java:2608)
          at org.apache.tez.dag.app.DAGAppMaster.startDAG(DAGAppMaster.java:2573)
          at org.apache.tez.dag.app.DAGAppMaster.submitDAGToAppMaster(DAGAppMaster.java:1379)
          at org.apache.tez.dag.api.client.DAGClientHandler.submitDAG(DAGClientHandler.java:145)
          at org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPBServerImpl.submitDAG(DAGClientAMProtocolBlockingPBServerImpl.java:187)
          at org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolRPC$DAGClientAMProtocol$2.callBlockingMethod(DAGClientAMProtocolRPC.java:8519)
          at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:533)
          at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
          at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:989)
          at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:917)
          at java.base/java.security.AccessController.doPrivileged(Native Method)
          at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
          at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
          at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2894)
      

      currently, TaskSchedulerManager depends on clientRpcServer, so TaskSchedulerManager waits for clientRpcServer to start, but once clientRpcServer is initialized, it can handle requests from e.g. HiveServer2 (so HiveServer2 is able to submit a dag) even before taskSchedulerManager is initialized
      we cannot change the order of service dependency as the TaskSchedulerManager needs the app host/port, so we might want to simply block the very-first DAG to be submitted while TaskSchedulerManager is not ready

      to solve this dependency cycle, my proposal is to introduce a service that can depend on services that must start and gets initialized before the first DAG comes, and its state can be checked before DAG submission, so basically I introduced a directed dependency graph like below:

      appMasterReadinessService -> taskSchedulerManager -> clientRpcServer
      

      Attachments

        Issue Links

          Activity

            People

              abstractdog László Bodor
              abstractdog László Bodor
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 2h 20m
                  2h 20m