Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-15217

FsUrlConnection does not handle paths with spaces

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.0.0
    • Fix Version/s: 3.2.0, 3.1.1, 3.0.4
    • Component/s: fs
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      When FsUrlStreamHandlerFactory is registered with java.net.URL (ex: when Spark is initialized), it breaks URLs with spaces (even though they are properly URI-encoded). I traced the problem down to FSUrlConnection.connect() method. It naively gets the path from the URL, which contains encoded spaces, and pases it to org.apache.hadoop.fs.Path(String) constructor. This is not correct, because the docs clearly say that the string must NOT be encoded. Doing so causes double encoding within the Path class (ie: %20 becomes %2520). 

      See attached JUnit test. 

      This test case mimics an issue I ran into when trying to use Commons Configuration 1.9 AFTER initializing Spark. Commons Configuration uses URL class to load configuration files, but Spark installs FsUrlStreamHandlerFactory, which hits this issue. For now, we are using an AspectJ aspect to "patch" the bytecode at load time to work-around the issue. 

      The real fix is quite simple. All you need to do is replace this line in org.apache.hadoop.fs.FsUrlConnection.connect():
             is = fs.open(new Path(url.getPath()));

      with this line:

           is = fs.open(new Path(url.toUri().getPath()));

      URI.getPath() will correctly decode the path, which is what is expected by org.apache.hadoop.fs.Path(String) constructor.

       

        Attachments

        1. HADOOP-15217.01.patch
          2 kB
          Zsolt Venczel
        2. HADOOP-15217.02.patch
          3 kB
          Zsolt Venczel
        3. HADOOP-15217.03.patch
          3 kB
          Zsolt Venczel
        4. HADOOP-15217.04.patch
          3 kB
          Zsolt Venczel
        5. HADOOP-15217.05.patch
          3 kB
          Zsolt Venczel
        6. HADOOP-15217.06.patch
          3 kB
          Zsolt Venczel
        7. TestCase.java
          0.9 kB
          Joseph Fourny

          Activity

            People

            • Assignee:
              zvenczel Zsolt Venczel
              Reporter:
              josephfourny Joseph Fourny
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: