Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-10155

Memory leak in SQL parsers

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • None
    • 1.5.1, 1.6.0
    • SQL
    • None

    Description

      I saw a lot of `ThreadLocal` objects in the following app:

      import org.apache.spark._
      import org.apache.spark.sql._
      
      object SparkApp {
      
        def foo(sqlContext: SQLContext): Unit = {
          import sqlContext.implicits._
          sqlContext.sparkContext.parallelize(Seq("aaa", "bbb", "ccc")).toDF().filter("length(_1) > 0").count()
        }
      
        def main(args: Array[String]): Unit = {
          val conf = new SparkConf().setAppName("sql-memory-leak")
          val sc = new SparkContext(conf)
          val sqlContext = new SQLContext(sc)
          while (true) {
            foo(sqlContext)
          }
        }
      }
      

      Running the above codes in a long time and finally it will OOM.

      These "ThreadLocal"s are from "scala.util.parsing.combinator.Parsers.lastNoSuccessVar", which stores `Failure("end of input", ...)`.

      There is an issue in Scala here: https://issues.scala-lang.org/browse/SI-9010
      and some discussions here: https://issues.scala-lang.org/browse/SI-4929

      Attachments

        Activity

          People

            zsxwing Shixiong Zhu
            zsxwing Shixiong Zhu
            Votes:
            1 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: