Uploaded image for project: 'Apache Tez'
  1. Apache Tez
  2. TEZ-1512

VertexImpl.getTask(int) can be CPU intensive when lots of tasks are present in the vertex

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.5.0
    • None
    • Reviewed

    Description

      I tried a synthetic benchmark (without much input data) with the tez app. This was tried to understand the bare minimum time taken by Tez for container launch / reuse / scheduling etc.

      Profiling DAGAppMaster showed that lots of CPU time was spent on VertexImpl.getTask(int) which gets accessed as a part of event handling and transitions.

      This problem would more prevalent in large jobs which has got lots of small tasks.

      I will attach the perf SVG output of the DAG soon.

      Attachments

        1. with_patch_large_job_small_tasks.svg
          1.74 MB
          Rajesh Balamohan
        2. TEZ-1512.2.patch
          2 kB
          Rajesh Balamohan
        3. TEZ-1512.1.WIP.patch
          3 kB
          Rajesh Balamohan
        4. large_job_small_tasks.svg
          1.90 MB
          Rajesh Balamohan

        Activity

          People

            rajesh.balamohan Rajesh Balamohan
            rajesh.balamohan Rajesh Balamohan
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: