Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-14277

Significant amount of CPU is being consumed in SnappyNative arrayCopy method

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.6.1
    • 2.0.0
    • Shuffle, Spark Core
    • None

    Description

      While running a Spark job which is spilling a lot of data in reduce phase, we see that significant amount of CPU is being consumed in native Snappy ArrayCopy method (Please see the stack trace below).

      Stack trace -
      org.xerial.snappy.SnappyNative.$$YJP$$arrayCopy(Native Method)
      org.xerial.snappy.SnappyNative.arrayCopy(SnappyNative.java)
      org.xerial.snappy.Snappy.arrayCopy(Snappy.java:85)
      org.xerial.snappy.SnappyInputStream.rawRead(SnappyInputStream.java:190)
      org.xerial.snappy.SnappyInputStream.read(SnappyInputStream.java:163)
      java.io.DataInputStream.readFully(DataInputStream.java:195)
      java.io.DataInputStream.readLong(DataInputStream.java:416)
      org.apache.spark.util.collection.unsafe.sort.UnsafeSorterSpillReader.loadNext(UnsafeSorterSpillReader.java:71)
      org.apache.spark.util.collection.unsafe.sort.UnsafeSorterSpillMerger$2.loadNext(UnsafeSorterSpillMerger.java:79)
      org.apache.spark.sql.execution.UnsafeExternalRowSorter$1.next(UnsafeExternalRowSorter.java:136)
      org.apache.spark.sql.execution.UnsafeExternalRowSorter$1.next(UnsafeExternalRowSorter.java:123)

      The reason for that is the SpillReader does a lot of small reads from the underlying snappy compressed stream and SnappyInputStream invokes native jni ArrayCopy method to copy the data, which is expensive. We should fix Snappy- java to use with non-JNI based System.arrayCopy method in this case.

      Attachments

        Activity

          People

            sitalkedia@gmail.com Sital Kedia
            sitalkedia@gmail.com Sital Kedia
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: