Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-10719

SQLImplicits.rddToDataFrameHolder is not thread safe when using Scala 2.10

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Incomplete
    • 1.3.1, 1.4.1, 1.5.0, 1.6.0
    • 2.3.0
    • SQL
    • Scala 2.10

    Description

      Sometimes the following codes failed

          val conf = new SparkConf().setAppName("sql-memory-leak")
          val sc = new SparkContext(conf)
          val sqlContext = new SQLContext(sc)
          import sqlContext.implicits._
          (1 to 1000).par.foreach { _ =>
            sc.parallelize(1 to 5).map { i => (i, i) }.toDF("a", "b").count()
          }
      

      The stack trace is

      Exception in thread "main" java.lang.UnsupportedOperationException: tail of empty list
      	at scala.collection.immutable.Nil$.tail(List.scala:339)
      	at scala.collection.immutable.Nil$.tail(List.scala:334)
      	at scala.reflect.internal.SymbolTable.popPhase(SymbolTable.scala:172)
      	at scala.reflect.internal.Symbols$Symbol.unsafeTypeParams(Symbols.scala:1477)
      	at scala.reflect.internal.Symbols$TypeSymbol.tpe(Symbols.scala:2777)
      	at scala.reflect.internal.Mirrors$RootsBase.init(Mirrors.scala:235)
      	at scala.reflect.runtime.JavaMirrors$class.createMirror(JavaMirrors.scala:34)
      	at scala.reflect.runtime.JavaMirrors$class.runtimeMirror(JavaMirrors.scala:61)
      	at scala.reflect.runtime.JavaUniverse.runtimeMirror(JavaUniverse.scala:12)
      	at scala.reflect.runtime.JavaUniverse.runtimeMirror(JavaUniverse.scala:12)
      	at SparkApp$$anonfun$main$1.apply$mcJI$sp(SparkApp.scala:16)
      	at SparkApp$$anonfun$main$1.apply(SparkApp.scala:15)
      	at SparkApp$$anonfun$main$1.apply(SparkApp.scala:15)
      	at scala.Function1$class.apply$mcVI$sp(Function1.scala:39)
      	at scala.runtime.AbstractFunction1.apply$mcVI$sp(AbstractFunction1.scala:12)
      	at scala.collection.parallel.immutable.ParRange$ParRangeIterator.foreach(ParRange.scala:91)
      	at scala.collection.parallel.ParIterableLike$Foreach.leaf(ParIterableLike.scala:975)
      	at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply$mcV$sp(Tasks.scala:54)
      	at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply(Tasks.scala:53)
      	at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply(Tasks.scala:53)
      	at scala.collection.parallel.Task$class.tryLeaf(Tasks.scala:56)
      	at scala.collection.parallel.ParIterableLike$Foreach.tryLeaf(ParIterableLike.scala:972)
      	at scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask$class.internal(Tasks.scala:172)
      	at scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.internal(Tasks.scala:514)
      	at scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask$class.compute(Tasks.scala:162)
      	at scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.compute(Tasks.scala:514)
      	at scala.concurrent.forkjoin.RecursiveAction.exec(RecursiveAction.java:160)
      	at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
      	at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
      	at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
      	at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
      

      Finally, I found the problem. The codes generated by Scala compiler to find the implicit TypeTag are not thread safe because of an issue in Scala 2.10: https://issues.scala-lang.org/browse/SI-6240
      This issue was fixed in Scala 2.11 but not backported to 2.10.

      Attachments

        Issue Links

          Activity

            People

              zsxwing Shixiong Zhu
              zsxwing Shixiong Zhu
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: