Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-10155

Memory leak in SQL parsers

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.5.1, 1.6.0
    • Component/s: SQL
    • Labels:
      None

      Description

      I saw a lot of `ThreadLocal` objects in the following app:

      import org.apache.spark._
      import org.apache.spark.sql._
      
      object SparkApp {
      
        def foo(sqlContext: SQLContext): Unit = {
          import sqlContext.implicits._
          sqlContext.sparkContext.parallelize(Seq("aaa", "bbb", "ccc")).toDF().filter("length(_1) > 0").count()
        }
      
        def main(args: Array[String]): Unit = {
          val conf = new SparkConf().setAppName("sql-memory-leak")
          val sc = new SparkContext(conf)
          val sqlContext = new SQLContext(sc)
          while (true) {
            foo(sqlContext)
          }
        }
      }
      

      Running the above codes in a long time and finally it will OOM.

      These "ThreadLocal"s are from "scala.util.parsing.combinator.Parsers.lastNoSuccessVar", which stores `Failure("end of input", ...)`.

      There is an issue in Scala here: https://issues.scala-lang.org/browse/SI-9010
      and some discussions here: https://issues.scala-lang.org/browse/SI-4929

        Attachments

          Activity

            People

            • Assignee:
              zsxwing Shixiong Zhu
              Reporter:
              zsxwing Shixiong Zhu
            • Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: