Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-25738

LOAD DATA INPATH doesn't work if hdfs conf includes port

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 2.4.0
    • Fix Version/s: 2.4.0
    • Component/s: SQL
    • Labels:
      None

      Description

      LOAD DATA INPATH throws java.net.URISyntaxException: Malformed IPv6 address at index 8 if your hdfs conf includes a port for the namenode.

      This is because the URI is passing in the value of the hdfs conf "fs.defaultFS" in for the host. Note that variable is called authority, but the 4-arg URI constructor actually expects a host: https://docs.oracle.com/javase/7/docs/api/java/net/URI.html#URI(java.lang.String,%20java.lang.String,%20java.lang.String,%20java.lang.String)

      val defaultFSConf = sparkSession.sessionState.newHadoopConf().get("fs.defaultFS")
      ...
      val newUri = new URI(scheme, authority, pathUri.getPath, pathUri.getFragment)
      

      https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala#L386

      This was introduced by SPARK-23425.

      Workaround: specify a fully qualified path, eg. instead of

      LOAD DATA INPATH '/some/path/on/hdfs'
      

      use

      LOAD DATA INPATH 'hdfs://fizz.buzz.com:8020/some/path/on/hdfs'
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                irashid Imran Rashid
                Reporter:
                irashid Imran Rashid
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: