Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-13456

Cannot create encoders for case classes defined in Spark shell after upgrading to Scala 2.11

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 2.0.0
    • Fix Version/s: 2.0.0
    • Component/s: SQL
    • Labels:
      None
    • Target Version/s:

      Description

      Spark 2.0 started to use Scala 2.11 by default since PR #10608. Unfortunately, after this upgrade, Spark fails to create encoders for case classes defined in REPL:

      import sqlContext.implicits._
      case class T(a: Int, b: Double)
      val ds = Seq(1 -> T(1, 1D), 2 -> T(2, 2D)).toDS()
      

      Exception thrown:

      org.apache.spark.sql.AnalysisException: Unable to generate an encoder for inner class `T` without access to the scope that this class was defined in.
      Try moving this class out of its parent class.;
        at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$resolveDeserializer$1.applyOrElse(Analyzer.scala:565)
        at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$resolveDeserializer$1.applyOrElse(Analyzer.scala:561)
        at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:262)
        at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:262)
        at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
        at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:261)
        at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:267)
        at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:267)
        at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:304)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:370)
        at scala.collection.Iterator$class.foreach(Iterator.scala:742)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
        at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
        at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
        at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
        at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:308)
        at scala.collection.AbstractIterator.to(Iterator.scala:1194)
        at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:300)
        at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1194)
        at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:287)
        at scala.collection.AbstractIterator.toArray(Iterator.scala:1194)
        at org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:353)
        at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:267)
        at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:267)
        at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:267)
        at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5$$anonfun$apply$11.apply(TreeNode.scala:333)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
        at scala.collection.immutable.List.foreach(List.scala:381)
        at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
        at scala.collection.immutable.List.map(List.scala:285)
        at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:331)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:370)
        at scala.collection.Iterator$class.foreach(Iterator.scala:742)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
        at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
        at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
        at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
        at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:308)
        at scala.collection.AbstractIterator.to(Iterator.scala:1194)
        at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:300)
        at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1194)
        at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:287)
        at scala.collection.AbstractIterator.toArray(Iterator.scala:1194)
        at org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:353)
        at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:267)
        at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:251)
        at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$.resolveDeserializer(Analyzer.scala:561)
        at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.resolve(ExpressionEncoder.scala:315)
        at org.apache.spark.sql.Dataset.<init>(Dataset.scala:81)
        at org.apache.spark.sql.Dataset.<init>(Dataset.scala:92)
        at org.apache.spark.sql.SQLContext.createDataset(SQLContext.scala:482)
        at org.apache.spark.sql.SQLImplicits.localSeqToDatasetHolder(SQLImplicits.scala:140)
        ... 51 elided
      

      However, existing Dataset REPL test case does pass:

        test("SPARK-2576 importing SQLContext.implicits._") {
          // We need to use local-cluster to test this case.
          val output = runInterpreter("local-cluster[1,1,1024]",
            """
              |val sqlContext = new org.apache.spark.sql.SQLContext(sc)
              |import sqlContext.implicits._
              |case class TestCaseClass(value: Int)
              |sc.parallelize(1 to 10).map(x => TestCaseClass(x)).toDF().collect()
              |
              |// Test Dataset Serialization in the REPL
              |Seq(TestCaseClass(1)).toDS().collect()
            """.stripMargin)
          assertDoesNotContain("error:", output)
          assertDoesNotContain("Exception", output)
        }
      

      One possible clue is that, ReplSuite calls SparkILoop directly, while Spark shell is started by o.a.s.repl.Main, which also sets option -Yrepl-class-based.

        Attachments

          Activity

            People

            • Assignee:
              cloud_fan Wenchen Fan
              Reporter:
              lian cheng Cheng Lian
            • Votes:
              1 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: