Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Duplicate
-
2.0.2
-
None
-
None
Description
When I create a UDF that jar file in hdfs, I can't use the UDF.
spark-sql> create function trans_array as 'com.test.udf.TransArray' using jar 'hdfs://host1:9000/spark/dev/share/libs/spark-proxy-server-biz-service-impl-1.0.0.jar'; spark-sql> describe function trans_array; Function: test_db.trans_array Class: com.alipay.spark.proxy.server.biz.service.impl.udf.TransArray Usage: N/A. Time taken: 0.127 seconds, Fetched 3 row(s) spark-sql> select trans_array(1, '\\|', id, position) as (id0, position0) from test_spark limit 10; Error in query: Undefined function: 'trans_array'. This function is neither a registered temporary function nor a permanent function registered in the database 'test_db'.; line 1 pos 7
The reason is when org.apache.spark.sql.internal.SessionState.FunctionResourceLoader.loadResource, the uri.toURL throw exception with " failed unknown protocol: hdfs"
def addJar(path: String): Unit = { sparkSession.sparkContext.addJar(path) val uri = new Path(path).toUri val jarURL = if (uri.getScheme == null) { // `path` is a local file path without a URL scheme new File(path).toURI.toURL } else { // `path` is a URL with a scheme {color:red}uri.toURL{color} } jarClassLoader.addURL(jarURL) Thread.currentThread().setContextClassLoader(jarClassLoader) }
I think we should setURLStreamHandlerFactory method on URL with an instance of FsUrlStreamHandlerFactory, just like:
static { URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory()); }
Attachments
Attachments
Issue Links
- duplicates
-
SPARK-12868 ADD JAR via sparkSQL JDBC will fail when using a HDFS URL
- Resolved
- links to