Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
2.2.0
-
None
-
None
Description
When calling a Hive UDF function from spark shell, we get the output when we call the query first time, but when we call the query again it gives following error
#spark2-shell
scala> spark.sql("select test(name) from customers limit 2").show (50, false)
org.apache.spark.sql.AnalysisException: No handler for Hive UDF 'com.vnb.fgp.generic.udf.encrypt.EncryptGenericUDF':
We have not provided the UDF jar files on the command line, but still we get the output. The function test is created in Hive service as a permanent function using the jar file.
Debugging it further we see that on first invocation of the select command the following classLoader is being used and it has a path pointing to the hdfs directory as set in Hive service:
loader: org.apache.spark.sql.internal.NonClosableMutableURLClassLoader@42cef0af
hdfs:/tmp/bimal/hive-extensions-1.0-SNAPSHOT-jar-with-dependencies.jar
file:/usr/java/jdk1.8.0_162/jre/lib/resources.jar
file:/usr/java/jdk1.8.0_162/jre/lib/rt.jar
On subsequent calls, a different class loader is being used:
loader scala.tools.nsc.interpreter.IMain$TranslatingClassLoader@7bc3ec95
file:/usr/java/jdk1.8.0_162/jre/lib/resources.jar
file:/usr/java/jdk1.8.0_162/jre/lib/rt.jar
file:/usr/java/jdk1.8.0_162/jre/lib/jsse.jar
file:/usr/java/jdk1.8.0_162/jre/lib/jce.jar
This does not have the hdfs path for the jar file and hence the exception is generated.
Most probably the classloader is picking things from Hive metastore.
If we pass the UDF jar files on command line using --jars option, everything works fine.
But this indicates that the classLoader and classpaths are different when called first and second time causing inconsistent behavior and cause problem.
Attachments
Issue Links
- duplicates
-
SPARK-26560 Repeating select on udf function throws analysis exception - function not registered
- Resolved