Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-11191

[1.5] Can't create UDF's using hive thrift service

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • 1.5.0, 1.5.1
    • 1.5.3, 1.6.0
    • SQL
    • None

    Description

      Since upgrading to spark 1.5 we've been unable to create and use UDF's when we run in thrift server mode.

      Our setup:
      We start the thrift-server running against yarn in client mode, (we've also built our own spark from github branch-1.5 with the following args: -Pyarn -Phive -Phive-thrifeserver

      If i run the following after connecting via JDBC (in this case via beeline):

      add jar 'hdfs://path/to/jar"
      (this command succeeds with no errors)

      CREATE TEMPORARY FUNCTION testUDF AS 'com.foo.class.UDF';
      (this command succeeds with no errors)

      select testUDF(col1) from table1;

      I get the following error in the logs:

      org.apache.spark.sql.AnalysisException: undefined function testUDF; line 1 pos 8
              at org.apache.spark.sql.hive.HiveFunctionRegistry$$anonfun$lookupFunction$2$$anonfun$1.apply(hiveUDFs.scala:58)
              at org.apache.spark.sql.hive.HiveFunctionRegistry$$anonfun$lookupFunction$2$$anonfun$1.apply(hiveUDFs.scala:58)
              at scala.Option.getOrElse(Option.scala:120)
              at org.apache.spark.sql.hive.HiveFunctionRegistry$$anonfun$lookupFunction$2.apply(hiveUDFs.scala:57)
              at org.apache.spark.sql.hive.HiveFunctionRegistry$$anonfun$lookupFunction$2.apply(hiveUDFs.scala:53)
              at scala.util.Try.getOrElse(Try.scala:77)
              at org.apache.spark.sql.hive.HiveFunctionRegistry.lookupFunction(hiveUDFs.scala:53)
              at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5$$anonfun$applyOrElse$24.apply(Analyzer.scala:506)
              at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5$$anonfun$applyOrElse$24.apply(Analyzer.scala:506)
              at org.apache.spark.sql.catalyst.analysis.package$.withPosition(package.scala:48)
              at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5.applyOrElse(Analyzer.scala:505)
              at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5.applyOrElse(Analyzer.scala:502)
              at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:227)
              at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:227)
              at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:51)
              at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:226)
              at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:232)
              at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:232)
              at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:249)
      

      (cutting the bulk for ease of report, more than happy to send the full output)

      15/10/12 14:34:37 ERROR SparkExecuteStatementOperation: Error running hive query:
      org.apache.hive.service.cli.HiveSQLException: org.apache.spark.sql.AnalysisException: undefined function testUDF; line 1 pos 100
              at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.runInternal(SparkExecuteStatementOperation.scala:259)
              at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:171)
              at java.security.AccessController.doPrivileged(Native Method)
              at javax.security.auth.Subject.doAs(Subject.java:422)
              at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
              at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1.run(SparkExecuteStatementOperation.scala:182)
              at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
              at java.util.concurrent.FutureTask.run(FutureTask.java:266)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
              at java.lang.Thread.run(Thread.java:745)
      

      When I ran the same against 1.4 it worked.

      I've also changed the spark.sql.hive.metastore.version version to be 0.13 (similar to what it was in 1.4) and 0.14 but I still get the same errors.

      Also, in 1.5, when you run it against the spark-sql shell, it works.

      Attachments

        Issue Links

          Activity

            People

              lian cheng Cheng Lian
              dyross David Ross
              Votes:
              9 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: