Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-2774

Set preferred locations for reduce tasks

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.5.0
    • Spark Core
    • None

    Description

      Currently we do not set preferred locations for reduce tasks in Spark. This patch proposes setting preferred locations based on the map output sizes and locations tracked by the MapOutputTracker. This is useful in two conditions

      1. When you have a small job in a large cluster it can be useful to co-locate map and reduce tasks to avoid going over the network
      2. If there is a lot of data skew in the map stage outputs, then it is beneficial to place the reducer close to the largest output.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            shivaram Shivaram Venkataraman
            shivaram Shivaram Venkataraman
            Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment