it seems that there is A SparkContext jobProgressListener memory leak.*. Bellow i describe the steps i do to reproduce that.
I have created a java webapp trying to abstractly Run some Spark Sql jobs that read data from HDFS (join them) and Write them To ElasticSearch using ES hadoop connector. After a Lot of consecutive runs i noticed that my heap space was full so i got an out of heap space error.
At the attached file
runs each time an Spark Sql Job is triggered. So tried to reuse the same SparkContext for a number of consecutive runs. If some rules apply i try to clean up the SparkContext by first calling
. This code eventually runs
So at some point in time i suppose that if no other SparkSql job should run i should kill the sparkContext (The AbstractSparkJobRunner.killSparkAndSqlContext runs) and this should be garbage collected from garbage collector. However this is not the case, Even if in my debugger shows that my JavaSparkContext object is null see attached picture
The jvisual vm shows an incremental heap space even when the garbage collector is called. See attached picture
The memory analyser Tool shows that a big part of the retained heap to be assigned to _jobProgressListener see attached picture
and summary picture
. Although at the same time in Singleton Service the JavaSparkContext is null.