Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-2569

Customized UDFs in hive not running with Spark SQL

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 1.0.0
    • 1.1.0
    • SQL
    • None
    • linux or mac, hive 0.9.0 and hive 0.13.0 with hadoop 1.0.4, scala 2.10.3, spark 1.0.0

    Description

      start spark-shell,
      init (like create hiveContext, import ._ ect, make sure the jar including the UDFs is in classpath)

      hql("CREATE TEMPORARY FUNCTION t_ts AS 'udf.Timestamp'"), which is successful.

      then i tried hql("select t_ts(time) from data_common where xxxx limit 1").collect().foreach(println), which failed with NullPointException

      we had discussion about it in the mail list.
      http://apache-spark-user-list.1001560.n3.nabble.com/run-sparksql-hiveudf-error-throw-NPE-td8888.html#a9006

      java.lang.NullPointerException org.apache.spark.sql.hive.HiveFunctionFactory$class.getFunctionClass(hiveUdfs.scala:117) org.apache.spark.sql.hive.HiveUdf.getFunctionClass(hiveUdfs.scala:157) org.apache.spark.sql.hive.HiveFunctionFactory$class.createFunction(hiveUdfs.scala:119) org.apache.spark.sql.hive.HiveUdf.createFunction(hiveUdfs.scala:157) org.apache.spark.sql.hive.HiveUdf.function$lzycompute(hiveUdfs.scala:170) org.apache.spark.sql.hive.HiveUdf.function(hiveUdfs.scala:170) org.apache.spark.sql.hive.HiveSimpleUdf.method$lzycompute(hiveUdfs.scala:181) org.apache.spark.sql.hive.HiveSimpleUdf.method(hiveUdfs.scala:180) org.apache.spark.sql.hive.HiveSimpleUdf.wrappers$lzycompute(hiveUdfs.scala:186) org.apache.spark.sql.hive.HiveSimpleUdf.wrappers(hiveUdfs.scala:186) org.apache.spark.sql.hive.HiveSimpleUdf.eval(hiveUdfs.scala:220) org.apache.spark.sql.catalyst.expressions.MutableProjection.apply(Projection.scala:64) org.apache.spark.sql.execution.Aggregate$$anonfun$execute$1$$anonfun$7.apply(Aggregate.scala:160) org.apache.spark.sql.execution.Aggregate$$anonfun$execute$1$$anonfun$7.apply(Aggregate.scala:153) org.apache.spark.rdd.RDD$$anonfun$12.apply(RDD.scala:580) org.apache.spark.rdd.RDD$$anonfun$12.apply(RDD.scala:580) org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:261) org.apache.spark.rdd.RDD.iterator(RDD.scala:228) org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)

      Attachments

        Activity

          People

            marmbrus Michael Armbrust
            jackyhung jacky hung
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: