Uploaded image for project: 'Apache Tez'
  1. Apache Tez
  2. TEZ-2612 Critical path analyzer for DAGs
  3. TEZ-2810

Support for showing allocation delays due to internal preemption

Details

    • Sub-task
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.8.1-alpha
    • None
    • None
    • Reviewed

    Description

      Also adds support for drawing critical path for attempts that did not get allocated or launched.

      Attachments

        1. cp4-preempt-1.png
          33 kB
          Bikas Saha
        2. cp4-preempt-2.png
          35 kB
          Bikas Saha
        3. cp4-preempt-3.png
          36 kB
          Bikas Saha
        4. cp5-no-launch.png
          37 kB
          Bikas Saha
        5. TEZ-2810.1.patch
          26 kB
          Bikas Saha
        6. TEZ-2810.2.patch
          40 kB
          Bikas Saha

        Activity

          bikassaha Bikas Saha added a comment -

          AnalyseDAG split into straggler analysis and allocation analysis.
          Allocation analysis adds preemption support. Find all preempted attempts in the job. Identify those that were preempted while a critical path attempt was waiting for allocation. Mark that as a potential cause for the preemption.

          Drawing critical path handles cases where allocation/launch have not occurred.

          Added test for preemption logic.

          bikassaha Bikas Saha added a comment - AnalyseDAG split into straggler analysis and allocation analysis. Allocation analysis adds preemption support. Find all preempted attempts in the job. Identify those that were preempted while a critical path attempt was waiting for allocation. Mark that as a potential cause for the preemption. Drawing critical path handles cases where allocation/launch have not occurred. Added test for preemption logic.
          bikassaha Bikas Saha added a comment - - edited

          rajesh.balamohan Please review. Thanks!




          bikassaha Bikas Saha added a comment - - edited rajesh.balamohan Please review. Thanks!
          tezqa TezQA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12755301/cp5-no-launch.png
          against master revision b288be7.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1109//console

          This message is automatically generated.

          tezqa TezQA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12755301/cp5-no-launch.png against master revision b288be7. -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1109//console This message is automatically generated.

          Minor comments

          • VertexInfo.getTaskAttemptsInternal can possibly be optimized not to have too many lists created. Not related to this change, I will create a separate JIRA for it.
          • The following condition in CriticalPathAnalyzer match multiple preempted attempts. Depending on the number of attempts that are getting pre-empted and size of the job, notes section can become quite big?
            if (a.getFinishTime() > creationTime && a.getFinishTime() < allocationTime){
            

            Should the first pre-empted attempt name be mentioned & provide additional info on the number of pre-empted attempts within that time range of creationTime & allocationTime?

          lgtm. +1

          rajesh.balamohan Rajesh Balamohan added a comment - Minor comments VertexInfo.getTaskAttemptsInternal can possibly be optimized not to have too many lists created. Not related to this change, I will create a separate JIRA for it. The following condition in CriticalPathAnalyzer match multiple preempted attempts. Depending on the number of attempts that are getting pre-empted and size of the job, notes section can become quite big? if (a.getFinishTime() > creationTime && a.getFinishTime() < allocationTime){ Should the first pre-empted attempt name be mentioned & provide additional info on the number of pre-empted attempts within that time range of creationTime & allocationTime? lgtm. +1
          bikassaha Bikas Saha added a comment -

          Thanks for the review!
          I am not expecting number of preemptions to be high. Also, when there are multiple pending tasks causing multiple preemptions, we currently dont have any connecting data in the events to pair them up. Given low number of preemptions and no correlating data, I decided to err on the side of mentioning all preemptions. In the common case this list will be small and will let us quickly look at which attempts got preempted (in the critical path or for log search). What I could do is consolidate them all in a single line. Will do that and commit.

          Thanks!

          bikassaha Bikas Saha added a comment - Thanks for the review! I am not expecting number of preemptions to be high. Also, when there are multiple pending tasks causing multiple preemptions, we currently dont have any connecting data in the events to pair them up. Given low number of preemptions and no correlating data, I decided to err on the side of mentioning all preemptions. In the common case this list will be small and will let us quickly look at which attempts got preempted (in the critical path or for log search). What I could do is consolidate them all in a single line. Will do that and commit. Thanks!
          bikassaha Bikas Saha added a comment -

          Patch with comment addressed and minor formatting change to remove space in attempt short name. Also minor fix to make sure execution time is used only if the attempt started.

          bikassaha Bikas Saha added a comment - Patch with comment addressed and minor formatting change to remove space in attempt short name. Also minor fix to make sure execution time is used only if the attempt started.
          tezqa TezQA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12755462/TEZ-2810.2.patch
          against master revision 46decf3.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 findbugs. The patch does not introduce any new Findbugs (version 3.0.1) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in .

          Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/1118//testReport/
          Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1118//console

          This message is automatically generated.

          tezqa TezQA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12755462/TEZ-2810.2.patch against master revision 46decf3. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 findbugs . The patch does not introduce any new Findbugs (version 3.0.1) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/1118//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1118//console This message is automatically generated.
          bikassaha Bikas Saha added a comment -

          commit b6f15dcdcf3cae841c361983f32229858ff79f42
          Author: Bikas Saha <bikas@apache.org>
          Date: Fri Sep 11 17:31:34 2015 -0700

          TEZ-2810. Support for showing allocation delays due to internal preemption (bikas)

          bikassaha Bikas Saha added a comment - commit b6f15dcdcf3cae841c361983f32229858ff79f42 Author: Bikas Saha <bikas@apache.org> Date: Fri Sep 11 17:31:34 2015 -0700 TEZ-2810 . Support for showing allocation delays due to internal preemption (bikas)

          People

            bikassaha Bikas Saha
            bikassaha Bikas Saha
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: