Description
Running Spark Streaming 2.0.0 on AWS EMR 5.0.0 consuming from Kinesis (125 shards).
Used memory keeps growing all the time according to Ganglia.
The application works properly for about 3.5 days till all free memory has been used.
Then, micro batches start queuing up but none is served.
Spark freezes. You can see in Ganglia that some memory is being freed but it doesn't help the job to recover.
Is it a memory/resource leak?
The job uses back pressure and Kryo.
The code has a mapToPair(), groupByKey(), flatMap(), persist(StorageLevel.MEMORY_AND_DISK_SER_2()) and repartition(19); Then storing to s3 using foreachRDD()
Cluster size: 20 machines
Spark cofiguration:
spark.executor.extraJavaOptions -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70 -XX:PermSize=256M -XX:MaxPermSize=256M -XX:OnOutOfMemoryError='kill -9 %p'
spark.driver.extraJavaOptions -Dspark.driver.log.level=INFO -XX:+UseConcMarkSweepGC -XX:PermSize=256M -XX:MaxPermSize=256M -XX:OnOutOfMemoryError='kill -9 %p'
spark.master yarn-cluster
spark.executor.instances 19
spark.executor.cores 7
spark.executor.memory 7500M
spark.driver.memory 7500M
spark.default.parallelism 133
spark.yarn.executor.memoryOverhead 2950
spark.yarn.driver.memoryOverhead 2950
spark.eventLog.enabled false
spark.eventLog.dir hdfs:///spark-logs/