Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-14290

Fully utilize the network bandwidth for Netty RPC by avoid significant underlying memory copy

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.2.0, 1.3.0, 1.4.0, 1.5.0, 1.6.0, 2.0.0
    • Fix Version/s: 2.0.0
    • Component/s: Input/Output, Spark Core
    • Labels:
      None

      Description

      When netty transfer data that is not from FileRegion, data will be transfered as ByteBuf, If the data is large, there will occur significant performance issue because there is memory copy underlying in sun.nio.ch.IOUtil.write, the CPU is 100% used, and network is very low. We can check it by comparing NIO and Netty for spark.shuffle.blockTransferService in spark 1.4. NIO network bandwidth is much better than Netty.

      How to reproduce:

      sc.parallelize(Array(1,2,3),3).mapPartitions(a=>Array(new Array[Double](1024 * 1024 * 50)).iterator).reduce((a,b)=> a).length
      

      The root cause can referred here.

        Attachments

          Activity

            People

            • Assignee:
              liyezhang556520 Zhang, Liye
              Reporter:
              liyezhang556520 Zhang, Liye
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: