Details

    • Sub-task
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.2.0
    • None
    • None

    Attachments

      1. TEZ-408.wip.patch
        32 kB
        Hitesh Shah
      2. TEZ-408.1.patch
        50 kB
        Hitesh Shah
      3. TEZ-408.2.patch
        55 kB
        Hitesh Shah

      Issue Links

        Activity

          hitesh Hitesh Shah added a comment -

          bikassaha Review please?

          hitesh Hitesh Shah added a comment - bikassaha Review please?
          hitesh Hitesh Shah added a comment - - edited

          Work still pending:

          • handle DAG failure scenarios - when the AM should die/not die depending on DAG failures.
          • unit tests
          • fix AM diagnostics in session mode
          hitesh Hitesh Shah added a comment - - edited Work still pending: handle DAG failure scenarios - when the AM should die/not die depending on DAG failures. unit tests fix AM diagnostics in session mode
          bikassaha Bikas Saha added a comment -

          DAGAppMasterState.IDLE in serviceStart() and go to RUNNING when DAG starts
          When DAG finishes then set DAG state == App state for non-session.
          When session exits without internal error then App master state should be SUCCEEDED.

          Probably create the common container context upfront since we block on it anyways.
          If we create a DAGContext in AppContext and house the current dag, its conf, and container context in it then it may make our life easier.

          bikassaha Bikas Saha added a comment - DAGAppMasterState.IDLE in serviceStart() and go to RUNNING when DAG starts When DAG finishes then set DAG state == App state for non-session. When session exits without internal error then App master state should be SUCCEEDED. Probably create the common container context upfront since we block on it anyways. If we create a DAGContext in AppContext and house the current dag, its conf, and container context in it then it may make our life easier.
          bikassaha Bikas Saha added a comment -

          TaskSchedulerEventHandler.getFinalAppStatus() either needs to handle the IDLE state or not see it.

          bikassaha Bikas Saha added a comment - TaskSchedulerEventHandler.getFinalAppStatus() either needs to handle the IDLE state or not see it.
          hitesh Hitesh Shah added a comment -

          Addressed all comments except for:

          <quote>
          Probably create the common container context upfront since we block on it anyways.
          If we create a DAGContext in AppContext and house the current dag, its conf, and container context in it then it may make our life easier.
          <quote>

          Will file separate jiras for addressing this in a follow-up.

          bikassaha, sseth any comments?

          hitesh Hitesh Shah added a comment - Addressed all comments except for: <quote> Probably create the common container context upfront since we block on it anyways. If we create a DAGContext in AppContext and house the current dag, its conf, and container context in it then it may make our life easier. <quote> Will file separate jiras for addressing this in a follow-up. bikassaha , sseth any comments?
          bikassaha Bikas Saha added a comment -

          This probably doesnt need to consider isSession but does need to consider IDLE state

          -      if (EnumSet.of(DAGAppMasterState.NEW, DAGAppMasterState.INITED,
          -          DAGAppMasterState.RUNNING).contains(appMaster.state)) {
          +      if (!appMaster.isSession() &&
          +          EnumSet.of(DAGAppMasterState.NEW, DAGAppMasterState.INITED,
          +              DAGAppMasterState.RUNNING).contains(appMaster.state))
          

          How about, if state is not ERROR, then we set state = SUCCEEDED in session.shutdownAM(). Then Scheduler does not need to know about sessions and thus needs no change.
          Orthogonally, close() sounds better to me than shutdownAM() for a session.

          I agree we can create DAGContext object in a separate jira. Can you please open one?

          Minor
          Initialize with others of same kind?

          +    this.dagCounter = new AtomicInteger(0);

          Jira???

          +      // FIXME implement a max time to wait for submit

          Killed DAGs ?

          +        if (finishEvt.getDAGState().equals(DAGState.SUCCEEDED)) {
          +          successfulDAGs.incrementAndGet();
          +        } else {
          +          failedDAGs.incrementAndGet();
          +        }
          
          bikassaha Bikas Saha added a comment - This probably doesnt need to consider isSession but does need to consider IDLE state - if (EnumSet.of(DAGAppMasterState.NEW, DAGAppMasterState.INITED, - DAGAppMasterState.RUNNING).contains(appMaster.state)) { + if (!appMaster.isSession() && + EnumSet.of(DAGAppMasterState.NEW, DAGAppMasterState.INITED, + DAGAppMasterState.RUNNING).contains(appMaster.state)) How about, if state is not ERROR, then we set state = SUCCEEDED in session.shutdownAM(). Then Scheduler does not need to know about sessions and thus needs no change. Orthogonally, close() sounds better to me than shutdownAM() for a session. I agree we can create DAGContext object in a separate jira. Can you please open one? Minor Initialize with others of same kind? + this .dagCounter = new AtomicInteger(0); Jira??? + // FIXME implement a max time to wait for submit Killed DAGs ? + if (finishEvt.getDAGState().equals(DAGState.SUCCEEDED)) { + successfulDAGs.incrementAndGet(); + } else { + failedDAGs.incrementAndGet(); + }
          bikassaha Bikas Saha added a comment -

          Rest looks fine! +1

          bikassaha Bikas Saha added a comment - Rest looks fine! +1
          hitesh Hitesh Shah added a comment -

          bikassaha Most comments addressed. Filed jiras for the add-on work. Regarding close vs shutdownAM, I have left it as is for now as shutdownAM is part of the DAGClient protocol - close would have made sense if there was an exclusive rpc protocol for the session.

          hitesh Hitesh Shah added a comment - bikassaha Most comments addressed. Filed jiras for the add-on work. Regarding close vs shutdownAM, I have left it as is for now as shutdownAM is part of the DAGClient protocol - close would have made sense if there was an exclusive rpc protocol for the session.
          hitesh Hitesh Shah added a comment -

          Committed to master.

          hitesh Hitesh Shah added a comment - Committed to master.
          bikassaha Bikas Saha added a comment -

          Looks good. Probably needs refresh since the debug dagplan stuff is already committed.

          bikassaha Bikas Saha added a comment - Looks good. Probably needs refresh since the debug dagplan stuff is already committed.
          hitesh Hitesh Shah added a comment -

          bikassaha Yes - the .2 patch was rebased against the debug related commit.

          hitesh Hitesh Shah added a comment - bikassaha Yes - the .2 patch was rebased against the debug related commit.

          People

            hitesh Hitesh Shah
            hitesh Hitesh Shah
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: