Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-18910

Can't use UDF that jar file in hdfs

Log workAgile BoardRank to TopRank to BottomArchiveAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete CommentsDeleteAdd fieldWhere is my field?Permission helperNotification helper
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Duplicate
    • 2.0.2
    • None
    • SQL
    • None

    Description

      When I create a UDF that jar file in hdfs, I can't use the UDF.

      spark-sql> create function trans_array as 'com.test.udf.TransArray'  using jar 'hdfs://host1:9000/spark/dev/share/libs/spark-proxy-server-biz-service-impl-1.0.0.jar';
      
      spark-sql> describe function trans_array;
      Function: test_db.trans_array
      Class: com.alipay.spark.proxy.server.biz.service.impl.udf.TransArray
      Usage: N/A.
      Time taken: 0.127 seconds, Fetched 3 row(s)
      
      spark-sql> select trans_array(1, '\\|', id, position) as (id0, position0) from test_spark limit 10;
      Error in query: Undefined function: 'trans_array'. This function is neither a registered temporary function nor a permanent function registered in the database 'test_db'.; line 1 pos 7
      

      The reason is when org.apache.spark.sql.internal.SessionState.FunctionResourceLoader.loadResource, the uri.toURL throw exception with " failed unknown protocol: hdfs"

        def addJar(path: String): Unit = {
          sparkSession.sparkContext.addJar(path)
      
          val uri = new Path(path).toUri
          val jarURL = if (uri.getScheme == null) {
            // `path` is a local file path without a URL scheme
            new File(path).toURI.toURL
          } else {
            // `path` is a URL with a scheme
            {color:red}uri.toURL{color}
          }
          jarClassLoader.addURL(jarURL)
          Thread.currentThread().setContextClassLoader(jarClassLoader)
        }
      

      I think we should setURLStreamHandlerFactory method on URL with an instance of FsUrlStreamHandlerFactory, just like:

      static {
      	URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory());
      }
      

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned Assign to me
            shenhong shenh062326
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment