Uploaded image for project: 'Apache Tez'
  1. Apache Tez
  2. TEZ-388 Pig on Tez
  3. TEZ-394

Better scheduling for uneven DAGs

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Patch Available
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      Consider a series of joins or group by on dataset A with few datasets that takes 10 hours followed by a final join with a dataset X. The vertex that loads dataset X will be one of the top vertexes and initialized early even though its output is not consumed till the end after 10 hours.

      1) Could either use delayed start logic for better resource allocation
      2) Else if they are started upfront, need to handle failure/recovery cases where the nodes which executed the MapTask might have gone down when the final join happens.

      Attachments

        1. TEZ-394.001.patch
          20 kB
          Jason Darrell Lowe
        2. TEZ-394.002.patch
          21 kB
          Jason Darrell Lowe
        3. TEZ-394.003.patch
          22 kB
          Jason Darrell Lowe
        4. TEZ-394.004.patch
          23 kB
          Jason Darrell Lowe

        Issue Links

          Activity

            People

              jlowe Jason Darrell Lowe
              rohini Rohini Palaniswamy
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated: