[SPARK-34971] The view with udf created by hive1.x cannot be read by spark - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Minor
Resolution: Unresolved
Affects Version/s: 2.4.0
Fix Version/s: None
Component/s: SQL
Labels:
None
Environment:

hive 1.1.0

spark 2.4

Description

First , use the command to register a function in hive:

create function shezm.hello as 'test.Hello' using jar 'hdfs:///udf_test/udf_test.jar'

Then create view with the udf in hive1.1 , like

create view shezm.test_view AS select shezm.hello(name) as v from shezm.test;

and read it use spark , it will get an error :

Exception in thread "main" org.apache.spark.sql.AnalysisException: Undefined function: 'shezm.hello'. This function is neither a registered temporary function nor a permanent function registered in the database 'default'.; line 1 pos 7
at org.apache.spark.sql.catalyst.analysis.Analyzer$LookupFunctions$$anonfun$apply$15$$anonfun$applyOrElse$51.apply(Analyzer.scala:1355) at org.apache.spark.sql.catalyst.analysis.Analyzer$LookupFunctions$$anonfun$apply$15$$anonfun$applyOrElse$51.apply(Analyzer.scala:1355) at org.apache.spark.sql.catalyst.analysis.Analyzer$LookupFunctions$$anonfun$apply$15$$anonfun$applyOrElse$51.apply(Analyzer.scala:1355) at org.apache.spark.sql.catalyst.analysis.package$.withPosition(package.scala:53) at org.apache.spark.sql.catalyst.analysis.Analyzer$LookupFunctions$$anonfun$apply$15.applyOrElse(Analyzer.scala:1354) at org.apache.spark.sql.catalyst.analysis.Analyzer$LookupFunctions$$anonfun$apply$15.applyOrElse(Analyzer.scala:1346) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:256) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:256)
......

When I investigated this issue, I found hive1.x will wrap all udf with backticks when create view with udf .like this:

hive> use shezm;
OK
Time taken: 0.999 seconds
hive> show create table test_view;
OK
CREATE VIEW `test_view` AS select `shezm.hello`(`test`.`id`) from `shezm`.`test`
Time taken: 1.761 seconds, Fetched: 1 row(s)

Spark will treat `shezm.hello` as a udf name, and cannot parse out the database (hive can).

I read the SqlBase.g4 file, the characters wrapped in backticks will be treated as complete strings, which seems to be a feature.

So, maybe this problem should be solved in AstBuilder#visitFunctionName()? By adding a case?

Attachments

Issue Links

is duplicated by

SPARK-25301 When a view uses an UDF from a non default database, Spark analyser throws AnalysisException

Resolved

relates to

SPARK-20918 Use FunctionIdentifier as function identifiers in FunctionRegistry

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: zzzzming95

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 06/Apr/21 15:10

Updated:: 03/Aug/21 07:14