Details

    • Sub-task
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.2.0
    • None
    • None

    Attachments

      1. TEZ-408.1.patch
        50 kB
        Hitesh Shah
      2. TEZ-408.2.patch
        55 kB
        Hitesh Shah
      3. TEZ-408.wip.patch
        32 kB
        Hitesh Shah

      Issue Links

      Activity

        hitesh Hitesh Shah added a comment -

        Bikas Saha Review please?

        hitesh Hitesh Shah added a comment - Bikas Saha Review please?
        hitesh Hitesh Shah added a comment - - edited

        Work still pending:

        • handle DAG failure scenarios - when the AM should die/not die depending on DAG failures.
        • unit tests
        • fix AM diagnostics in session mode
        hitesh Hitesh Shah added a comment - - edited Work still pending: handle DAG failure scenarios - when the AM should die/not die depending on DAG failures. unit tests fix AM diagnostics in session mode
        bikassaha Bikas Saha added a comment -

        DAGAppMasterState.IDLE in serviceStart() and go to RUNNING when DAG starts
        When DAG finishes then set DAG state == App state for non-session.
        When session exits without internal error then App master state should be SUCCEEDED.

        Probably create the common container context upfront since we block on it anyways.
        If we create a DAGContext in AppContext and house the current dag, its conf, and container context in it then it may make our life easier.

        bikassaha Bikas Saha added a comment - DAGAppMasterState.IDLE in serviceStart() and go to RUNNING when DAG starts When DAG finishes then set DAG state == App state for non-session. When session exits without internal error then App master state should be SUCCEEDED. Probably create the common container context upfront since we block on it anyways. If we create a DAGContext in AppContext and house the current dag, its conf, and container context in it then it may make our life easier.
        bikassaha Bikas Saha added a comment -

        TaskSchedulerEventHandler.getFinalAppStatus() either needs to handle the IDLE state or not see it.

        bikassaha Bikas Saha added a comment - TaskSchedulerEventHandler.getFinalAppStatus() either needs to handle the IDLE state or not see it.
        hitesh Hitesh Shah added a comment -

        Addressed all comments except for:

        <quote>
        Probably create the common container context upfront since we block on it anyways.
        If we create a DAGContext in AppContext and house the current dag, its conf, and container context in it then it may make our life easier.
        <quote>

        Will file separate jiras for addressing this in a follow-up.

        Bikas Saha, Siddharth Seth any comments?

        hitesh Hitesh Shah added a comment - Addressed all comments except for: <quote> Probably create the common container context upfront since we block on it anyways. If we create a DAGContext in AppContext and house the current dag, its conf, and container context in it then it may make our life easier. <quote> Will file separate jiras for addressing this in a follow-up. Bikas Saha , Siddharth Seth any comments?
        bikassaha Bikas Saha added a comment -

        This probably doesnt need to consider isSession but does need to consider IDLE state

        -      if (EnumSet.of(DAGAppMasterState.NEW, DAGAppMasterState.INITED,
        -          DAGAppMasterState.RUNNING).contains(appMaster.state)) {
        +      if (!appMaster.isSession() &&
        +          EnumSet.of(DAGAppMasterState.NEW, DAGAppMasterState.INITED,
        +              DAGAppMasterState.RUNNING).contains(appMaster.state))
        

        How about, if state is not ERROR, then we set state = SUCCEEDED in session.shutdownAM(). Then Scheduler does not need to know about sessions and thus needs no change.
        Orthogonally, close() sounds better to me than shutdownAM() for a session.

        I agree we can create DAGContext object in a separate jira. Can you please open one?

        Minor
        Initialize with others of same kind?

        +    this.dagCounter = new AtomicInteger(0);

        Jira???

        +      // FIXME implement a max time to wait for submit

        Killed DAGs ?

        +        if (finishEvt.getDAGState().equals(DAGState.SUCCEEDED)) {
        +          successfulDAGs.incrementAndGet();
        +        } else {
        +          failedDAGs.incrementAndGet();
        +        }
        
        bikassaha Bikas Saha added a comment - This probably doesnt need to consider isSession but does need to consider IDLE state - if (EnumSet.of(DAGAppMasterState.NEW, DAGAppMasterState.INITED, - DAGAppMasterState.RUNNING).contains(appMaster.state)) { + if (!appMaster.isSession() && + EnumSet.of(DAGAppMasterState.NEW, DAGAppMasterState.INITED, + DAGAppMasterState.RUNNING).contains(appMaster.state)) How about, if state is not ERROR, then we set state = SUCCEEDED in session.shutdownAM(). Then Scheduler does not need to know about sessions and thus needs no change. Orthogonally, close() sounds better to me than shutdownAM() for a session. I agree we can create DAGContext object in a separate jira. Can you please open one? Minor Initialize with others of same kind? + this .dagCounter = new AtomicInteger(0); Jira??? + // FIXME implement a max time to wait for submit Killed DAGs ? + if (finishEvt.getDAGState().equals(DAGState.SUCCEEDED)) { + successfulDAGs.incrementAndGet(); + } else { + failedDAGs.incrementAndGet(); + }
        bikassaha Bikas Saha added a comment -

        Rest looks fine! +1

        bikassaha Bikas Saha added a comment - Rest looks fine! +1
        hitesh Hitesh Shah added a comment -

        Bikas Saha Most comments addressed. Filed jiras for the add-on work. Regarding close vs shutdownAM, I have left it as is for now as shutdownAM is part of the DAGClient protocol - close would have made sense if there was an exclusive rpc protocol for the session.

        hitesh Hitesh Shah added a comment - Bikas Saha Most comments addressed. Filed jiras for the add-on work. Regarding close vs shutdownAM, I have left it as is for now as shutdownAM is part of the DAGClient protocol - close would have made sense if there was an exclusive rpc protocol for the session.
        hitesh Hitesh Shah added a comment -

        Committed to master.

        hitesh Hitesh Shah added a comment - Committed to master.
        bikassaha Bikas Saha added a comment -

        Looks good. Probably needs refresh since the debug dagplan stuff is already committed.

        bikassaha Bikas Saha added a comment - Looks good. Probably needs refresh since the debug dagplan stuff is already committed.
        hitesh Hitesh Shah added a comment -

        Bikas Saha Yes - the .2 patch was rebased against the debug related commit.

        hitesh Hitesh Shah added a comment - Bikas Saha Yes - the .2 patch was rebased against the debug related commit.

        People

          hitesh Hitesh Shah
          hitesh Hitesh Shah
          Votes:
          0 Vote for this issue
          Watchers:
          Start watching this issue

          Dates

            Created:
            Updated:
            Resolved:

            Slack