Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-34971

The view with udf created by hive1.x cannot be read by spark

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 2.4.0
    • None
    • SQL
    • None
    • hive 1.1.0

      spark 2.4

    Description

      First , use the command to register a function in hive:

      create function shezm.hello as 'test.Hello' using jar 'hdfs:///udf_test/udf_test.jar'
      

       

       Then create view with the udf in hive1.1 , like  

      create view shezm.test_view AS select shezm.hello(name) as v from shezm.test;
      

      and read it use spark , it will get an error :

      Exception in thread "main" org.apache.spark.sql.AnalysisException: Undefined function: 'shezm.hello'. This function is neither a registered temporary function nor a permanent function registered in the database 'default'.; line 1 pos 7
      at org.apache.spark.sql.catalyst.analysis.Analyzer$LookupFunctions$$anonfun$apply$15$$anonfun$applyOrElse$51.apply(Analyzer.scala:1355) at org.apache.spark.sql.catalyst.analysis.Analyzer$LookupFunctions$$anonfun$apply$15$$anonfun$applyOrElse$51.apply(Analyzer.scala:1355) at org.apache.spark.sql.catalyst.analysis.Analyzer$LookupFunctions$$anonfun$apply$15$$anonfun$applyOrElse$51.apply(Analyzer.scala:1355) at org.apache.spark.sql.catalyst.analysis.package$.withPosition(package.scala:53) at org.apache.spark.sql.catalyst.analysis.Analyzer$LookupFunctions$$anonfun$apply$15.applyOrElse(Analyzer.scala:1354) at org.apache.spark.sql.catalyst.analysis.Analyzer$LookupFunctions$$anonfun$apply$15.applyOrElse(Analyzer.scala:1346) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:256) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:256)
      ......
      

       

      When I investigated this issue, I found hive1.x will wrap all udf with backticks when create view with udf .like this:

      hive> use shezm;
      OK
      Time taken: 0.999 seconds
      hive> show create table test_view;
      OK
      CREATE VIEW `test_view` AS select `shezm.hello`(`test`.`id`) from `shezm`.`test`
      Time taken: 1.761 seconds, Fetched: 1 row(s)
      
      

      Spark will treat `shezm.hello` as a udf name, and cannot parse out the database (hive can).

      I read the SqlBase.g4 file, the characters wrapped in backticks will be treated as complete strings, which seems to be a feature.

       

      So, maybe this problem should be solved in AstBuilder#visitFunctionName()? By adding a case?

       

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              Zing zzzzming95
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: