Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-25694

URL.setURLStreamHandlerFactory causing incompatible HttpURLConnection issue

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 2.3.0, 2.3.1, 2.3.2, 2.4.4, 3.0.0
    • 2.4.6, 3.0.0
    • Spark Core, SQL
    • None

    Description

      URL.setURLStreamHandlerFactory() in SharedState causes URL.openConnection() returns FsUrlConnection object, which is not compatible with HttpURLConnection. This will cause exception when using some third party http library (e.g. scalaj.http).

      The following code in Spark 2.3.0 introduced the issue: sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala:

      object SharedState extends Logging  {   ...   
        URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory())   ...
      }
      

      Here is the example exception when using scalaj.http in Spark:

       StackTrace: scala.MatchError: org.apache.hadoop.fs.FsUrlConnection:[http://wwww.example.com|http://wwww.example.com/] (of class org.apache.hadoop.fs.FsUrlConnection)
       at scalaj.http.HttpRequest.scalaj$http$HttpRequest$$doConnection(Http.scala:343)
       at scalaj.http.HttpRequest.exec(Http.scala:335)
       at scalaj.http.HttpRequest.asString(Http.scala:455)
      

       
      One option to fix the issue is to return null in URLStreamHandlerFactory.createURLStreamHandler when the protocol is http/https, so it will use the default behavior and be compatible with scalaj.http. Following is the code example:

      class SparkUrlStreamHandlerFactory extends URLStreamHandlerFactory with Logging {
      
        private val fsUrlStreamHandlerFactory = new FsUrlStreamHandlerFactory()
      
        override def createURLStreamHandler(protocol: String): URLStreamHandler = {
          val handler = fsUrlStreamHandlerFactory.createURLStreamHandler(protocol)
          if (handler == null) {
            return null
          }
      
          if (protocol != null &&
            (protocol.equalsIgnoreCase("http")
            || protocol.equalsIgnoreCase("https"))) {
            // return null to use system default URLStreamHandler
            null
          } else {
            handler
          }
        }
      }
      

      I would like to get some discussion here before submitting a pull request.

      Attachments

        Issue Links

          Activity

            People

              ZhouJIANG Zhou Jiang
              boyangwa Bo Yang
              Felix Cheung Felix Cheung
              Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: