Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-13288

[1.6.0] Memory leak in Spark streaming

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • 1.6.0
    • None
    • DStreams
    • Bare metal cluster
      RHEL 6.6

    Description

      Streaming in 1.6 seems to have a memory leak.

      Running the same streaming app in Spark 1.5.1 and 1.6, all things equal, 1.6 showed a gradual increasing processing time.

      The app is simple: 1 Kafka receiver of tweet stream and 20 executors processing the tweets in 5-second batches.

      Spark 1.5.0 handles this smoothly and did not show increasing processing time in the 40-minute test; but 1.6 showed increasing time about 8 minutes into the test. Please see chart here:

      https://ibm.box.com/s/7q4ulik70iwtvyfhoj1dcl4nc469b116

      I captured heap dumps in two version and did a comparison. I noticed the Byte is using 50X more space in 1.5.1.

      Here are some top classes in heap histogram and references.

      Heap Histogram

      All Classes (excluding platform)
      1.6.0 Streaming 1.5.1 Streaming
      Class Instance Count Total Size Class Instance Count Total Size
      class [B 8453 3,227,649,599 class [B 5095 62,938,466
      class [C 44682 4,255,502 class [C 130482 12,844,182
      class java.lang.reflect.Method 9059 1,177,670 class java.lang.String 130171 1,562,052

      References by Type References by Type

      class [B [0x640039e38] class [B [0x6c020bb08]

      Referrers by Type Referrers by Type

      Class Count Class Count
      java.nio.HeapByteBuffer 3239 sun.security.util.DerInputBuffer 1233
      sun.security.util.DerInputBuffer 1233 sun.security.util.ObjectIdentifier 620
      sun.security.util.ObjectIdentifier 620 [[B 397
      [Ljava.lang.Object; 408 java.lang.reflect.Method 326


      The total size by class B is 3GB in 1.5.1 and only 60MB in 1.6.0.
      The Java.nio.HeapByteBuffer referencing class did not show up in top in 1.5.1.

      I have also placed jstack output for 1.5.1 and 1.6.0 online..you can get them here

      https://ibm.box.com/sparkstreaming-jstack160
      https://ibm.box.com/sparkstreaming-jstack151

      Jesse

      Attachments

        Activity

          People

            Unassigned Unassigned
            jfchen@us.ibm.com JESSE CHEN
            Votes:
            1 Vote for this issue
            Watchers:
            13 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: