Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-4740

Netty's network throughput is about 1/2 of NIO's in spark-perf sortByKey

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.2.0
    • Fix Version/s: 1.2.0
    • Component/s: Shuffle, Spark Core
    • Labels:
      None
    • Target Version/s:

      Description

      When testing current spark master (1.3.0-snapshot) with spark-perf (sort-by-key, aggregate-by-key, etc), Netty based shuffle transferService takes much longer time than NIO based shuffle transferService. The network throughput of Netty is only about half of that of NIO.

      We tested with standalone mode, and the data set we used for test is 20 billion records, and the total size is about 400GB. Spark-perf test is Running on a 4 node cluster with 10G NIC, 48 cpu cores per node and each executor memory is 64GB. The reduce tasks number is set to 1000.


      Reynold update on Dec 15, 2014: The problem is that in NIO we have multiple connections between two nodes, but in Netty we only had one. We introduced a new config option spark.shuffle.io.numConnectionsPerPeer to allow users to explicitly increase the number of connections between two nodes. SPARK-4853 is a follow-up ticket to investigate setting this automatically by Spark.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                rxin Reynold Xin
                Reporter:
                liyezhang556520 Zhang, Liye
              • Votes:
                0 Vote for this issue
                Watchers:
                11 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: