Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-17380

Spark streaming with a multi shard Kinesis freezes after several days (memory/resource leak?)

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Not A Problem
    • 2.0.0
    • 2.0.2
    • DStreams
    • None

    Description

      Running Spark Streaming 2.0.0 on AWS EMR 5.0.0 consuming from Kinesis (125 shards).
      Used memory keeps growing all the time according to Ganglia.
      The application works properly for about 3.5 days till all free memory has been used.
      Then, micro batches start queuing up but none is served.
      Spark freezes. You can see in Ganglia that some memory is being freed but it doesn't help the job to recover.
      Is it a memory/resource leak?

      The job uses back pressure and Kryo.
      The code has a mapToPair(), groupByKey(), flatMap(), persist(StorageLevel.MEMORY_AND_DISK_SER_2()) and repartition(19); Then storing to s3 using foreachRDD()

      Cluster size: 20 machines
      Spark cofiguration:

      spark.executor.extraJavaOptions -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70 -XX:PermSize=256M -XX:MaxPermSize=256M -XX:OnOutOfMemoryError='kill -9 %p'
      spark.driver.extraJavaOptions -Dspark.driver.log.level=INFO -XX:+UseConcMarkSweepGC -XX:PermSize=256M -XX:MaxPermSize=256M -XX:OnOutOfMemoryError='kill -9 %p'
      spark.master yarn-cluster
      spark.executor.instances 19
      spark.executor.cores 7
      spark.executor.memory 7500M
      spark.driver.memory 7500M
      spark.default.parallelism 133
      spark.yarn.executor.memoryOverhead 2950
      spark.yarn.driver.memoryOverhead 2950
      spark.eventLog.enabled false
      spark.eventLog.dir hdfs:///spark-logs/

      Attachments

        1. memory.png
          22 kB
          Xeto
        2. memory-after-freeze.png
          20 kB
          Xeto
        3. exec_Leak_Hunter.zip
          25 kB
          Xeto

        Activity

          People

            Unassigned Unassigned
            xeto Xeto
            Votes:
            2 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: