Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-1719

Improve the utilization of shuffle copier threads

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.16.0
    • None

    Description

      In the current design, the scheduling of copies is done and the scheduler (the main loop in fetchOutputs) won't schedule anything until it hears back from at least one of the copier threads. Due to this, the main loop won't query the TaskTracker asking for new map locations and may not be using all the copiers effectively. This may not be an issue for small-sized map outputs, where at steady state, the frequency of such notifications is frequent.
      Ideally, we should schedule all what we can, and, depending on how busy we currently are, query the tasktracker for more map locations.

      Attachments

        1. HADOOP-1719.patch
          3 kB
          Amar Kamat
        2. HADOOP-1719.patch
          3 kB
          Amar Kamat
        3. 1719.patch
          3 kB
          Devaraj Das
        4. 1719.patch
          4 kB
          Devaraj Das
        5. 1719.1.patch
          4 kB
          Devaraj Das

        Issue Links

          Activity

            People

              amar_kamat Amar Kamat
              ddas Devaraj Das
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 0.5h
                  0.5h