Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-6589

SQLUserDefinedType failed in spark-shell

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Not A Problem
    • 1.2.0
    • None
    • SQL
    • None
    • CDH 5.3.2

    Description

      DataType.fromJson will fail in spark-shell if the schema includes "udt". It works if running in an application.

      This causes that I cannot read a parquet file including a UDT field. DataType.fromCaseClass does not support UDT.

      I can load the class which shows that my UDT is in the classpath.

      scala> Class.forName("com.bwang.MyTestUDT")
      res6: Class[_] = class com.bwang.MyTestUDT
      

      But DataType fails:

      scala> DataType.fromJson(json)                                                                                                      java.lang.ClassNotFoundException: com.bwang.MyTestUDT
              at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
              at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
              at java.security.AccessController.doPrivileged(Native Method)
              at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
              at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
              at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
              at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
              at java.lang.Class.forName0(Native Method)
              at java.lang.Class.forName(Class.java:190)
              at org.apache.spark.sql.catalyst.types.DataType$.parseDataType(dataTypes.scala:77)
      

      The reason is DataType.fromJson tries to load udtClass using this code:

          case JSortedObject(
              ("class", JString(udtClass)),
              ("pyClass", _),
              ("sqlType", _),
              ("type", JString("udt"))) =>
            Class.forName(udtClass).newInstance().asInstanceOf[UserDefinedType[_]]
        }
      

      Unfortunately, my UDT is loaded by SparkIMain$TranslatingClassLoader, but DataType is loaded by Launcher$AppClassLoader.

      scala> DataType.getClass.getClassLoader
      res2: ClassLoader = sun.misc.Launcher$AppClassLoader@6876fb1b
      
      scala> this.getClass.getClassLoader
      res3: ClassLoader = org.apache.spark.repl.SparkIMain$TranslatingClassLoader@63d36b29
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              bewang.tech Benyi Wang
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: