Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-23251

ClassNotFoundException: scala.Any when there's a missing implicit Map encoder

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Incomplete
    • 2.3.1
    • None
    • SQL
    • mac os high sierra, centos 7

    Description

      In branch-2.2, when you attempt to use row.getValuesMap[Any] without an implicit Map encoder, you get a nice descriptive compile-time error:

      scala> df.map(row => row.getValuesMap[Any](List("stationName", "year"))).collect
      <console>:26: error: Unable to find encoder for type stored in a Dataset.  Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._  Support for serializing other types will be added in future releases.
             df.map(row => row.getValuesMap[Any](List("stationName", "year"))).collect
                   ^
      scala> implicit val mapEncoder = org.apache.spark.sql.Encoders.kryo[Map[String, Any]]
      mapEncoder: org.apache.spark.sql.Encoder[Map[String,Any]] = class[value[0]: binary]
      scala> df.map(row => row.getValuesMap[Any](List("stationName", "year"))).collect
      res1: Array[Map[String,Any]] = Array(Map(stationName -> 007026 99999, year -> 2014), Map(stationName -> 007026 99999, year -> 2014), Map(stationName -> 007026 99999, year -> 2014),
      etc.......
      

       
      On the latest master and also on branch-2.3, the transformation compiles (at least on spark-shell), but throws a ClassNotFoundException:

       

      scala> df.map(row => row.getValuesMap[Any](List("stationName", "year"))).collect
      java.lang.ClassNotFoundException: scala.Any
       at scala.reflect.internal.util.AbstractFileClassLoader.findClass(AbstractFileClassLoader.scala:62)
       at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
       at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
       at java.lang.Class.forName0(Native Method)
       at java.lang.Class.forName(Class.java:348)
       at scala.reflect.runtime.JavaMirrors$JavaMirror.javaClass(JavaMirrors.scala:555)
       at scala.reflect.runtime.JavaMirrors$JavaMirror$$anonfun$classToJava$1.apply(JavaMirrors.scala:1211)
       at scala.reflect.runtime.JavaMirrors$JavaMirror$$anonfun$classToJava$1.apply(JavaMirrors.scala:1203)
       at scala.reflect.runtime.TwoWayCaches$TwoWayCache$$anonfun$toJava$1.apply(TwoWayCaches.scala:49)
       at scala.reflect.runtime.Gil$class.gilSynchronized(Gil.scala:19)
       at scala.reflect.runtime.JavaUniverse.gilSynchronized(JavaUniverse.scala:16)
       at scala.reflect.runtime.TwoWayCaches$TwoWayCache.toJava(TwoWayCaches.scala:44)
       at scala.reflect.runtime.JavaMirrors$JavaMirror.classToJava(JavaMirrors.scala:1203)
       at scala.reflect.runtime.JavaMirrors$JavaMirror.runtimeClass(JavaMirrors.scala:194)
       at scala.reflect.runtime.JavaMirrors$JavaMirror.runtimeClass(JavaMirrors.scala:54)
       at org.apache.spark.sql.catalyst.ScalaReflection$.getClassFromType(ScalaReflection.scala:700)
       at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$org$apache$spark$sql$catalyst$ScalaReflection$$dataTypeFor$1.apply(ScalaReflection.scala:84)
       at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$org$apache$spark$sql$catalyst$ScalaReflection$$dataTypeFor$1.apply(ScalaReflection.scala:65)
       at scala.reflect.internal.tpe.TypeConstraints$UndoLog.undo(TypeConstraints.scala:56)
       at org.apache.spark.sql.catalyst.ScalaReflection$class.cleanUpReflectionObjects(ScalaReflection.scala:824)
       at org.apache.spark.sql.catalyst.ScalaReflection$.cleanUpReflectionObjects(ScalaReflection.scala:39)
       at org.apache.spark.sql.catalyst.ScalaReflection$.org$apache$spark$sql$catalyst$ScalaReflection$$dataTypeFor(ScalaReflection.scala:64)
       at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$org$apache$spark$sql$catalyst$ScalaReflection$$serializerFor$1.apply(ScalaReflection.scala:512)
       at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$org$apache$spark$sql$catalyst$ScalaReflection$$serializerFor$1.apply(ScalaReflection.scala:445)
       at scala.reflect.internal.tpe.TypeConstraints$UndoLog.undo(TypeConstraints.scala:56)
       at org.apache.spark.sql.catalyst.ScalaReflection$class.cleanUpReflectionObjects(ScalaReflection.scala:824)
       at org.apache.spark.sql.catalyst.ScalaReflection$.cleanUpReflectionObjects(ScalaReflection.scala:39)
       at org.apache.spark.sql.catalyst.ScalaReflection$.org$apache$spark$sql$catalyst$ScalaReflection$$serializerFor(ScalaReflection.scala:445)
       at org.apache.spark.sql.catalyst.ScalaReflection$.serializerFor(ScalaReflection.scala:434)
       at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.apply(ExpressionEncoder.scala:71)
       at org.apache.spark.sql.SQLImplicits.newMapEncoder(SQLImplicits.scala:172)
       ... 49 elided
      scala> implicit val mapEncoder = org.apache.spark.sql.Encoders.kryo[Map[String, Any]]
      mapEncoder: org.apache.spark.sql.Encoder[Map[String,Any]] = class[value[0]: binary]
      scala> df.map(row => row.getValuesMap[Any](List("stationName", "year"))).collect
      res1: Array[Map[String,Any]] = Array(Map(stationName -> 007026 99999, year -> 2014), Map(stationName -> 007026 99999, year -> 2014),
      etc.......
      

       

      This message is a lot less helpful.

      As with with 2.2, specifying the Map encoder allows the transformation and action to execute.

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            bersprockets Bruce Robbins
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: