Details
Description
Spark streaming application when configured with checkpointing is filling driver's heap with multiple ZipFileInputStream instances as results of spark-assembly.jar (potentially some others like for example snappy-java.jar) getting repetitively referenced (loaded?). Java Finalizer can't finalize these ZipFileInputStream instances and it eventually takes all heap leading the driver to OOM crash.
Steps to reproduce:
- Submit attached bug.py to spark
- Leave it running and monitor the driver java process heap
- with heap dump you will primarily see growing instances of byte array data (here cumulated zip payload of the jar refs):
num #instances #bytes class name ---------------------------------------------- 1: 32653 32735296 [B 2: 48000 5135816 [C 3: 41 1344144 [Lscala.concurrent.forkjoin.ForkJoinTask; 4: 11362 1261816 java.lang.Class 5: 47054 1129296 java.lang.String 6: 25460 1018400 java.lang.ref.Finalizer 7: 9802 789400 [Ljava.lang.Object;
- with visualvm you can see:
- increasing number of objects pending for finalization
- increasing number of ZipFileInputStreams instances related to the spark-assembly.jar referenced by Finalizer
- increasing number of objects pending for finalization
- with heap dump you will primarily see growing instances of byte array data (here cumulated zip payload of the jar refs):
- Depending on the heap size and running time this will lead to driver OOM crash
Comments
- The bug.py is lightweight proof of the problem. In production I am experiencing this as quite rapid effect - in few hours it eats gigs of heap and kills the app.
- If the same bug.py is run without checkpointing there is no issue whatsoever.
- Not sure if it is just pyspark related.
- In bug.py I am using the socketTextStream input but seems to be independent of the input type (in production having same problem with Kafka direct stream, have seen it even with textFileStream).
- It is happening even if the input stream doesn't produce any data.
Attachments
Attachments
Issue Links
- is blocked by
-
SPARK-12652 Upgrade py4j to the incoming version 0.9.1
- Resolved
- is duplicated by
-
SPARK-11711 Finalizer memory leak is pyspark
- Resolved
- relates to
-
SPARK-11711 Finalizer memory leak is pyspark
- Resolved
- links to