Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-25738

LOAD DATA INPATH doesn't work if hdfs conf includes port

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • 2.4.0
    • 2.4.0
    • SQL
    • None

    Description

      LOAD DATA INPATH throws java.net.URISyntaxException: Malformed IPv6 address at index 8 if your hdfs conf includes a port for the namenode.

      This is because the URI is passing in the value of the hdfs conf "fs.defaultFS" in for the host. Note that variable is called authority, but the 4-arg URI constructor actually expects a host: https://docs.oracle.com/javase/7/docs/api/java/net/URI.html#URI(java.lang.String,%20java.lang.String,%20java.lang.String,%20java.lang.String)

      val defaultFSConf = sparkSession.sessionState.newHadoopConf().get("fs.defaultFS")
      ...
      val newUri = new URI(scheme, authority, pathUri.getPath, pathUri.getFragment)
      

      https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala#L386

      This was introduced by SPARK-23425.

      Workaround: specify a fully qualified path, eg. instead of

      LOAD DATA INPATH '/some/path/on/hdfs'
      

      use

      LOAD DATA INPATH 'hdfs://fizz.buzz.com:8020/some/path/on/hdfs'
      

      Attachments

        Issue Links

          Activity

            People

              irashid Imran Rashid
              irashid Imran Rashid
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: