Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-14277

Significant amount of CPU is being consumed in SnappyNative arrayCopy method

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.6.1
    • Fix Version/s: 2.0.0
    • Component/s: Shuffle
    • Labels:
      None

      Description

      While running a Spark job which is spilling a lot of data in reduce phase, we see that significant amount of CPU is being consumed in native Snappy ArrayCopy method (Please see the stack trace below).

      Stack trace -
      org.xerial.snappy.SnappyNative.$$YJP$$arrayCopy(Native Method)
      org.xerial.snappy.SnappyNative.arrayCopy(SnappyNative.java)
      org.xerial.snappy.Snappy.arrayCopy(Snappy.java:85)
      org.xerial.snappy.SnappyInputStream.rawRead(SnappyInputStream.java:190)
      org.xerial.snappy.SnappyInputStream.read(SnappyInputStream.java:163)
      java.io.DataInputStream.readFully(DataInputStream.java:195)
      java.io.DataInputStream.readLong(DataInputStream.java:416)
      org.apache.spark.util.collection.unsafe.sort.UnsafeSorterSpillReader.loadNext(UnsafeSorterSpillReader.java:71)
      org.apache.spark.util.collection.unsafe.sort.UnsafeSorterSpillMerger$2.loadNext(UnsafeSorterSpillMerger.java:79)
      org.apache.spark.sql.execution.UnsafeExternalRowSorter$1.next(UnsafeExternalRowSorter.java:136)
      org.apache.spark.sql.execution.UnsafeExternalRowSorter$1.next(UnsafeExternalRowSorter.java:123)

      The reason for that is the SpillReader does a lot of small reads from the underlying snappy compressed stream and SnappyInputStream invokes native jni ArrayCopy method to copy the data, which is expensive. We should fix Snappy- java to use with non-JNI based System.arrayCopy method in this case.

        Attachments

          Activity

            People

            • Assignee:
              sitalkedia@gmail.com Sital Kedia
              Reporter:
              sitalkedia@gmail.com Sital Kedia
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: