Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-1719

Improve the utilization of shuffle copier threads

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.16.0
    • None

    Description

      In the current design, the scheduling of copies is done and the scheduler (the main loop in fetchOutputs) won't schedule anything until it hears back from at least one of the copier threads. Due to this, the main loop won't query the TaskTracker asking for new map locations and may not be using all the copiers effectively. This may not be an issue for small-sized map outputs, where at steady state, the frequency of such notifications is frequent.
      Ideally, we should schedule all what we can, and, depending on how busy we currently are, query the tasktracker for more map locations.

      Attachments

        1. 1719.1.patch
          4 kB
          Devaraj Das
        2. 1719.patch
          4 kB
          Devaraj Das
        3. 1719.patch
          3 kB
          Devaraj Das
        4. HADOOP-1719.patch
          3 kB
          Amar Kamat
        5. HADOOP-1719.patch
          3 kB
          Amar Kamat

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            amar_kamat Amar Kamat
            ddas Devaraj Das
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 0.5h
                0.5h

                Slack

                  Issue deployment