Description
LOAD DATA INPATH throws java.net.URISyntaxException: Malformed IPv6 address at index 8 if your hdfs conf includes a port for the namenode.
This is because the URI is passing in the value of the hdfs conf "fs.defaultFS" in for the host. Note that variable is called authority, but the 4-arg URI constructor actually expects a host: https://docs.oracle.com/javase/7/docs/api/java/net/URI.html#URI(java.lang.String,%20java.lang.String,%20java.lang.String,%20java.lang.String)
val defaultFSConf = sparkSession.sessionState.newHadoopConf().get("fs.defaultFS") ... val newUri = new URI(scheme, authority, pathUri.getPath, pathUri.getFragment)
This was introduced by SPARK-23425.
Workaround: specify a fully qualified path, eg. instead of
LOAD DATA INPATH '/some/path/on/hdfs'
use
LOAD DATA INPATH 'hdfs://fizz.buzz.com:8020/some/path/on/hdfs'
Attachments
Issue Links
- is caused by
-
SPARK-23425 load data for hdfs file path with wild card usage is not working properly
- Resolved
- links to