Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-25738

LOAD DATA INPATH doesn't work if hdfs conf includes port

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 2.4.0
    • Fix Version/s: 2.4.0
    • Component/s: SQL
    • Labels:
      None

      Description

      LOAD DATA INPATH throws java.net.URISyntaxException: Malformed IPv6 address at index 8 if your hdfs conf includes a port for the namenode.

      This is because the URI is passing in the value of the hdfs conf "fs.defaultFS" in for the host. Note that variable is called authority, but the 4-arg URI constructor actually expects a host: https://docs.oracle.com/javase/7/docs/api/java/net/URI.html#URI(java.lang.String,%20java.lang.String,%20java.lang.String,%20java.lang.String)

      val defaultFSConf = sparkSession.sessionState.newHadoopConf().get("fs.defaultFS")
      ...
      val newUri = new URI(scheme, authority, pathUri.getPath, pathUri.getFragment)
      

      https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala#L386

      This was introduced by SPARK-23425.

      Workaround: specify a fully qualified path, eg. instead of

      LOAD DATA INPATH '/some/path/on/hdfs'
      

      use

      LOAD DATA INPATH 'hdfs://fizz.buzz.com:8020/some/path/on/hdfs'
      

        Attachments

        Issue Links

          Activity

            People

            • Assignee:
              irashid Imran Rashid
              Reporter:
              irashid Imran Rashid

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment