Uploaded image for project: 'Apache Tez'
  1. Apache Tez
  2. TEZ-1750

Add a DAGScheduler which schedules tasks only when sources have been scheduled

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • None
    • 0.5.3
    • None
    • None

    Description

      Splitting out the patch on TEZ-1522 into a separate jira.

      There's several scenarios in which we end up scheduling downstream tasks before their sources have been scheduled - and then get into a situation where the sources are starved. Currently, anywhere a ShuffleVertexManager is used can cause such behaviour - since it starts scheduling it's tasks after a certain number of sources are complete, but subsequen non-shuffle VertexManagers will scheduled immediately.
      Disabling slow-start is one option to achieve this (or setting slow start on all vertices), but it doesn't work for the situation where dynamic reducer parallelism kicks in - since it has to wait for source tasks to complete.

      The intent here is to add a DAGScheduler, which affectively negates the slow start, and in case of dynamic parallelism determination, waits for upstream tasks to be scheduled before scheduling downstream tasks.

      Attachments

        1. TEZ-1750.3.txt
          29 kB
          Siddharth Seth
        2. TEZ-1750.2.txt
          29 kB
          Siddharth Seth
        3. TEZ-1750.1.txt
          29 kB
          Siddharth Seth

        Activity

          People

            sseth Siddharth Seth
            sseth Siddharth Seth
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: