When calling a Hive UDF function from spark shell, we get the output when we call the query first time, but when we call the query again it gives following error
scala> spark.sql("select test(name) from customers limit 2").show (50, false)
org.apache.spark.sql.AnalysisException: No handler for Hive UDF 'com.vnb.fgp.generic.udf.encrypt.EncryptGenericUDF':
We have not provided the UDF jar files on the command line, but still we get the output. The function test is created in Hive service as a permanent function using the jar file.
Debugging it further we see that on first invocation of the select command the following classLoader is being used and it has a path pointing to the hdfs directory as set in Hive service:
On subsequent calls, a different class loader is being used:
This does not have the hdfs path for the jar file and hence the exception is generated.
Most probably the classloader is picking things from Hive metastore.
If we pass the UDF jar files on command line using --jars option, everything works fine.
But this indicates that the classLoader and classpaths are different when called first and second time causing inconsistent behavior and cause problem.