Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-14290

Fully utilize the network bandwidth for Netty RPC by avoid significant underlying memory copy

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.2.0, 1.3.0, 1.4.0, 1.5.0, 1.6.0, 2.0.0
    • 2.0.0
    • Input/Output, Spark Core
    • None

    Description

      When netty transfer data that is not from FileRegion, data will be transfered as ByteBuf, If the data is large, there will occur significant performance issue because there is memory copy underlying in sun.nio.ch.IOUtil.write, the CPU is 100% used, and network is very low. We can check it by comparing NIO and Netty for spark.shuffle.blockTransferService in spark 1.4. NIO network bandwidth is much better than Netty.

      How to reproduce:

      sc.parallelize(Array(1,2,3),3).mapPartitions(a=>Array(new Array[Double](1024 * 1024 * 50)).iterator).reduce((a,b)=> a).length
      

      The root cause can referred here.

      Attachments

        Activity

          People

            liyezhang556520 Zhang, Liye
            liyezhang556520 Zhang, Liye
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: