Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-6844

Memory leak occurs when register temp table with cache table on

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.3.0
    • Fix Version/s: 1.4.0
    • Component/s: SQL
    • Labels:

      Description

      There is a memory leak in register temp table with cache on

      This is the simple code to reproduce this issue:

          val sparkConf = new SparkConf().setAppName("LeakTest")
          val sparkContext = new SparkContext(sparkConf)
          val sqlContext = new SQLContext(sparkContext)
          val tableName = "tmp"
          val jsonrdd = sparkContext.textFile("""sample.json""")
          var loopCount = 1L
          while(true) {
            sqlContext.jsonRDD(jsonrdd).registerTempTable(tableName)
            sqlContext.cacheTable(tableName)
            println("L: " +loopCount + " R:" + sqlContext.sql("""select count(*) from tmp""").count())
            sqlContext.uncacheTable(tableName)
            loopCount += 1
          }
      

      The cause is that the InMemoryRelation. InMemoryColumnarTableScan uses the accumulator (InMemoryRelation.batchStats,InMemoryColumnarTableScan.readPartitions, InMemoryColumnarTableScan.readBatches ) to get some information from partitions or for test. These accumulators will register itself into a static map in Accumulators.originals and never get cleaned up.

        Attachments

          Activity

            People

            • Assignee:
              viirya L. C. Hsieh
              Reporter:
              jhu Jack Hu
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: