Uploaded image for project: 'Zeppelin'
  1. Zeppelin
  2. ZEPPELIN-150

Registered UDFs does not work on Spark jobs initiated from Zeppelin

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 0.5.0
    • Fix Version/s: None
    • Component/s: Interpreters
    • Labels:
      None
    • Environment:
      • Zeppelin 0.5.0
      • Spark 1.3.1 on top yarn cluster
      • Hadoop 2.4

      Description

      When trying using UDF from Zeppelin we get java.lang.ClassNotFoundException: org.apache.zeppelin.spark.ZeppelinContext
      (see below the full exception).

      Steps to reproduce:

      1. Create and register the UDF:

      def getNum(): Int = {
          100
      }
      hc.udf.register("getNum",getNum _)
      

      2. Try on exists table:

      %sql select getNum() from filteredNc limit 1
      

      Failed.

      3. Directly on HiveContext:

      hc.sql("select getNum() from filteredNc limit 1").collect
      

      Failed.

      • filteredNc is a local table that loaded from Hive (see below).
      few insights / comments

      1. On Spark shell it works as expected.
      2. This bug happened only with RDDs/tables that originated from external source (Hive/S3 parquet files). Creating new DataFrame and register it works as expected. Creating DataFrame out of DataFrame that loaded from hive - failed.
      3. It does not happen locally.

      The (almost) full exception:

       WARN [2015-06-28 08:43:53,850] ({task-result-getter-0} Logging.scala[logWarning]:71) - Lost task 0.2 in stage 23.0 (TID 1626, ip-10-216-204-246.ec2.internal): java.lang.NoClassDefFoundError: Lorg/apache/zeppelin/spark/ZeppelinContext;
          at java.lang.Class.getDeclaredFields0(Native Method)
          at java.lang.Class.privateGetDeclaredFields(Class.java:2499)
          at java.lang.Class.getDeclaredField(Class.java:1951)
          at java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1659)
      
      <Many more of ObjectStreamClass lines of exception>
      
      Caused by: java.lang.ClassNotFoundException: org.apache.zeppelin.spark.ZeppelinContext
          at org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:69)
          at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
          at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
          ... 103 more
      Caused by: java.lang.ClassNotFoundException: org.apache.zeppelin.spark.ZeppelinContext
          at java.lang.ClassLoader.findClass(ClassLoader.java:531)
          at org.apache.spark.util.ParentClassLoader.findClass(ParentClassLoader.scala:26)
          at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
          at org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.scala:34)
          at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
          at org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.scala:30)
          at org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:64)
          ... 105 more
      

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              ophchu Ophir Cohen
            • Votes:
              5 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated: