Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-11700

Memory leak at SparkContext jobProgressListener stageIdToData map

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 1.5.0, 1.5.1, 1.5.2
    • Fix Version/s: 1.6.0
    • Component/s: Spark Core, SQL
    • Labels:
    • Environment:

      Ubuntu 14.04 LTS, Oracle JDK 1.8.51 Apache tomcat 8.0.28. Spring 4

    • Target Version/s:
    • Flags:
      Important

      Description

      it seems that there is A SparkContext jobProgressListener memory leak.*. Bellow i describe the steps i do to reproduce that.

      I have created a java webapp trying to abstractly Run some Spark Sql jobs that read data from HDFS (join them) and Write them To ElasticSearch using ES hadoop connector. After a Lot of consecutive runs i noticed that my heap space was full so i got an out of heap space error.

      At the attached file

       AbstractSparkJobRunner 

      the

        public final void run(T jobConfiguration, ExecutionLog executionLog) throws Exception  

      runs each time an Spark Sql Job is triggered. So tried to reuse the same SparkContext for a number of consecutive runs. If some rules apply i try to clean up the SparkContext by first calling

       killSparkAndSqlContext 

      . This code eventually runs

        synchronized (sparkContextThreadLock) {
                  if (javaSparkContext != null) {
                      LOGGER.info("!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! CLEARING SPARK CONTEXT!!!!!!!!!!!!!!!!!!!!!!!!!!!");
                      javaSparkContext.stop();
                      javaSparkContext = null;
                      sqlContext = null;
      
                      System.gc();
                  }
                  numberOfRunningJobsForSparkContext.getAndSet(0);
              }
      

      .

      So at some point in time i suppose that if no other SparkSql job should run i should kill the sparkContext (The AbstractSparkJobRunner.killSparkAndSqlContext runs) and this should be garbage collected from garbage collector. However this is not the case, Even if in my debugger shows that my JavaSparkContext object is null see attached picture

       SparkContextPossibleMemoryLeakIDEA_DEBUG.png 

      .

      The jvisual vm shows an incremental heap space even when the garbage collector is called. See attached picture

       SparkHeapSpaceProgress.png 

      .

      The memory analyser Tool shows that a big part of the retained heap to be assigned to _jobProgressListener see attached picture

       SparkMemoryAfterLotsOfConsecutiveRuns.png 

      and summary picture

       SparkMemoryLeakAfterLotsOfRunsWithinTheSameContext.png 

      . Although at the same time in Singleton Service the JavaSparkContext is null.

        Attachments

        1. SparkMemoryLeakAfterLotsOfRunsWithinTheSameContext.png
          192 kB
          Kostas papageorgopoulos
        2. SparkMemoryAfterLotsOfConsecutiveRuns.png
          183 kB
          Kostas papageorgopoulos
        3. SparkHeapSpaceProgress.png
          147 kB
          Kostas papageorgopoulos
        4. SparkContextPossibleMemoryLeakIDEA_DEBUG.png
          241 kB
          Kostas papageorgopoulos
        5. AbstractSparkJobRunner.java
          20 kB
          Kostas papageorgopoulos

          Activity

            People

            • Assignee:
              davies Davies Liu
              Reporter:
              p02096 Kostas papageorgopoulos
            • Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: