Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-4853

Automatically adjust the number of connections between two peers to achieve good performance

Rank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • 1.2.0
    • None
    • Shuffle, Spark Core

    Description

      As discovered in SPARK-4740, performance of the new Netty transport can be impacted by the total number of active connections. This manifests itself when the following 3 conditions are true:
      (1) # spinning disks per node is large (doesn't affect SSDs)
      (2) # cores/node is large
      (3) # nodes is small

      In 1.2, we created a new config variable spark.shuffle.io.numConnectionsPerPeer that allows users to explicitly increase the number of connections between any two nodes. Ideally, we should have Spark automatically figure out the optimal (or near optimal) setting is so users don't have to worry about this config option.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            rxin Reynold Xin
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment