Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
2.3.0, 2.3.1, 2.3.2, 2.4.4, 3.0.0
-
None
Description
URL.setURLStreamHandlerFactory() in SharedState causes URL.openConnection() returns FsUrlConnection object, which is not compatible with HttpURLConnection. This will cause exception when using some third party http library (e.g. scalaj.http).
The following code in Spark 2.3.0 introduced the issue: sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala:
object SharedState extends Logging { ... URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory()) ... }
Here is the example exception when using scalaj.http in Spark:
StackTrace: scala.MatchError: org.apache.hadoop.fs.FsUrlConnection:[http://wwww.example.com|http://wwww.example.com/] (of class org.apache.hadoop.fs.FsUrlConnection)
at scalaj.http.HttpRequest.scalaj$http$HttpRequest$$doConnection(Http.scala:343)
at scalaj.http.HttpRequest.exec(Http.scala:335)
at scalaj.http.HttpRequest.asString(Http.scala:455)
One option to fix the issue is to return null in URLStreamHandlerFactory.createURLStreamHandler when the protocol is http/https, so it will use the default behavior and be compatible with scalaj.http. Following is the code example:
class SparkUrlStreamHandlerFactory extends URLStreamHandlerFactory with Logging { private val fsUrlStreamHandlerFactory = new FsUrlStreamHandlerFactory() override def createURLStreamHandler(protocol: String): URLStreamHandler = { val handler = fsUrlStreamHandlerFactory.createURLStreamHandler(protocol) if (handler == null) { return null } if (protocol != null && (protocol.equalsIgnoreCase("http") || protocol.equalsIgnoreCase("https"))) { // return null to use system default URLStreamHandler null } else { handler } } }
I would like to get some discussion here before submitting a pull request.
Attachments
Issue Links
- relates to
-
HADOOP-14598 Blacklist Http/HttpsFileSystem in FsUrlStreamHandlerFactory
- Resolved
-
SPARK-12868 ADD JAR via sparkSQL JDBC will fail when using a HDFS URL
- Resolved
- links to