Hadoop Common
  1. Hadoop Common
  2. HADOOP-7531

Add servlet util methods for handling paths in requests

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.23.0
    • Fix Version/s: 0.23.0
    • Component/s: util
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Common side of HDFS-2235.

      1. hadoop-7531-1.patch
        3 kB
        Eli Collins
      2. hadoop-7531-2.patch
        3 kB
        Eli Collins
      3. hadoop-7531-1.patch
        3 kB
        Eli Collins
      4. hadoop-7531-3.patch
        3 kB
        Eli Collins
      5. hadoop-7531-4.patch
        3 kB
        Eli Collins
      6. hadoop-7531-5.patch
        3 kB
        Eli Collins

        Issue Links

          Activity

          Eli Collins created issue -
          Eli Collins made changes -
          Field Original Value New Value
          Link This issue blocks HDFS-2235 [ HDFS-2235 ]
          Hide
          Eli Collins added a comment -

          Patch attached. Adds methods to decode/encode paths in Http requests.

          Show
          Eli Collins added a comment - Patch attached. Adds methods to decode/encode paths in Http requests.
          Eli Collins made changes -
          Attachment hadoop-7531-1.patch [ 12489894 ]
          Hide
          Daryn Sharp added a comment -

          Should the encoding/decoding exceptions really be trapped and subbed with null? How will the caller know exactly how or why the operation failed? It seems prone to error as illustrated in the linked jira. The return value is being +'ed onto a string which will unexpectedly result in "null" being added to the string.

          To avoid copy/paste code, would it make sense for getDecodedPath to call URIUtil.decode(getRawPath(request, servletName))?

          Show
          Daryn Sharp added a comment - Should the encoding/decoding exceptions really be trapped and subbed with null? How will the caller know exactly how or why the operation failed? It seems prone to error as illustrated in the linked jira. The return value is being +'ed onto a string which will unexpectedly result in "null" being added to the string. To avoid copy/paste code, would it make sense for getDecodedPath to call URIUtil.decode(getRawPath(request, servletName)) ?
          Hide
          Eli Collins added a comment -

          Thanks for taking a look! The URI exception is not for encoding/decoding exceptions, it's thrown if the default proto char set (UTF-8) is not supported. The web UIs wouldn't even start up if this were the case. Could go either way on getDecodedPath, it's one line of duplication and saves a method call, happy to change if you feel strongly.

          Show
          Eli Collins added a comment - Thanks for taking a look! The URI exception is not for encoding/decoding exceptions, it's thrown if the default proto char set (UTF-8) is not supported. The web UIs wouldn't even start up if this were the case. Could go either way on getDecodedPath, it's one line of duplication and saves a method call, happy to change if you feel strongly.
          Hide
          Daryn Sharp added a comment -

          True, it shouldn't happen unless someone calls URI.setDefaultProtocolCharset("absurdValueHere!"). When Murphy visits, it's going to be a PITA to debug a url of "http://nn/listPathsnull" as opposed to an exception of "you set a bad charset!".

          If you don't mind, I'd really prefer that the redundancy is removed. To me, it's like fingers on a chalkboard... [cringe]

          Show
          Daryn Sharp added a comment - True, it shouldn't happen unless someone calls URI.setDefaultProtocolCharset("absurdValueHere!") . When Murphy visits, it's going to be a PITA to debug a url of "http://nn/listPathsnull" as opposed to an exception of "you set a bad charset!". If you don't mind, I'd really prefer that the redundancy is removed. To me, it's like fingers on a chalkboard... [cringe]
          Hide
          Eli Collins added a comment -

          True, it shouldn't happen unless someone calls URI.setDefaultProtocolCharset("absurdValueHere!").

          And we wouldn't let someone check that in.. =)

          If you don't mind, I'd really prefer that the redundancy is removed.

          Done. It's a little goofy in that it requires decode("/") == "/" but that will always be true here so is fine.

          Show
          Eli Collins added a comment - True, it shouldn't happen unless someone calls URI.setDefaultProtocolCharset("absurdValueHere!"). And we wouldn't let someone check that in.. =) If you don't mind, I'd really prefer that the redundancy is removed. Done. It's a little goofy in that it requires decode("/") == "/" but that will always be true here so is fine.
          Eli Collins made changes -
          Attachment hadoop-7531-2.patch [ 12489910 ]
          Hide
          Daryn Sharp added a comment -

          And we wouldn't let someone check that in.. =)

          I'm worried because it's a latent land-mine. Over 20 years I've debugged so many "it'll never happen" bugs that I strongly adhere to Murphy's Law. Someday the charset will be changed, either in error or with legit purpose, or an external jar will fiddle with the charset, or a deployment env will use an incompatible charset, etc. When that day arrives, some poor developer will spend too much time debugging why the web page acts goofy instead of immediately seeing an invalid charset exception.

          Perhaps the safest way is to not catch the exception and use URIUtil.encodeXXX(String, "UTF-8")?

          Show
          Daryn Sharp added a comment - And we wouldn't let someone check that in.. =) I'm worried because it's a latent land-mine. Over 20 years I've debugged so many "it'll never happen" bugs that I strongly adhere to Murphy's Law. Someday the charset will be changed, either in error or with legit purpose, or an external jar will fiddle with the charset, or a deployment env will use an incompatible charset, etc. When that day arrives, some poor developer will spend too much time debugging why the web page acts goofy instead of immediately seeing an invalid charset exception. Perhaps the safest way is to not catch the exception and use URIUtil.encodeXXX(String, "UTF-8") ?
          Hide
          Eli Collins added a comment -

          Agree specifying UTF-8 explicitly is better. Updated patch attached.

          Show
          Eli Collins added a comment - Agree specifying UTF-8 explicitly is better. Updated patch attached.
          Eli Collins made changes -
          Attachment hadoop-7531-1.patch [ 12490011 ]
          Hide
          Eli Collins added a comment -

          Btw that still requires catching exceptions, unlike Jetty's URIUtil all commons URIUtil encode/decode methods throw URIException, which is lame since most people want standard byte-by-byte encoding.

          Show
          Eli Collins added a comment - Btw that still requires catching exceptions, unlike Jetty's URIUtil all commons URIUtil encode/decode methods throw URIException, which is lame since most people want standard byte-by-byte encoding.
          Hide
          Todd Lipcon added a comment -

          hmm, sure you attached the right patch? I don't see the "UTF-8" arg in the latest upload.

          Show
          Todd Lipcon added a comment - hmm, sure you attached the right patch? I don't see the "UTF-8" arg in the latest upload.
          Hide
          Todd Lipcon added a comment -

          Another small thing: in getDecodedPath and getRawPath, I think you should do a Preconditions.checkArgument(request.getRequestURI().startsWith(servletName)). Otherwise the substring might chop something else off if the function is misused.

          Show
          Todd Lipcon added a comment - Another small thing: in getDecodedPath and getRawPath, I think you should do a Preconditions.checkArgument(request.getRequestURI().startsWith(servletName)). Otherwise the substring might chop something else off if the function is misused.
          Hide
          Todd Lipcon added a comment -

          Last thing... URIUtil.encodeQuery doesn't seem to do much:

          groovy:000> URIUtil.encodeQuery("ugi=a&b=c")        
          ===> ugi=a&b=c
          

          I think what we really want is to be appending "ugi=" + URIUtil.encodeWithinQuery(ugi)

          Show
          Todd Lipcon added a comment - Last thing... URIUtil.encodeQuery doesn't seem to do much: groovy:000> URIUtil.encodeQuery( "ugi=a&b=c" ) ===> ugi=a&b=c I think what we really want is to be appending "ugi=" + URIUtil.encodeWithinQuery(ugi)
          Hide
          Eli Collins added a comment -

          Yup, right patch this time.

          Show
          Eli Collins added a comment - Yup, right patch this time.
          Eli Collins made changes -
          Attachment hadoop-7531-3.patch [ 12490045 ]
          Hide
          Todd Lipcon added a comment -

          hmm:

          +    Preconditions.checkArgument(request.getRequestURI().startsWith(servletName));
          +    return null == request.getRequestURI() ? "/" :
          

          The second line implies that request.getRequestURI() may be null, in which case the first line would throw NPE. Is it actually possible to be null? If so, separate the ternary into an if/else with the precondition inside the if, I guess?

          Show
          Todd Lipcon added a comment - hmm: + Preconditions.checkArgument(request.getRequestURI().startsWith(servletName)); + return null == request.getRequestURI() ? "/" : The second line implies that request.getRequestURI() may be null, in which case the first line would throw NPE. Is it actually possible to be null? If so, separate the ternary into an if/else with the precondition inside the if, I guess?
          Hide
          Eli Collins added a comment - - edited

          Actually no, I think the only reason it checked for null was the mock in TestStreamFile and HDFS-2235 now mocks getRequestURI so that's no longer the case. Updated patch.

          Show
          Eli Collins added a comment - - edited Actually no, I think the only reason it checked for null was the mock in TestStreamFile and HDFS-2235 now mocks getRequestURI so that's no longer the case. Updated patch.
          Eli Collins made changes -
          Attachment hadoop-7531-4.patch [ 12490051 ]
          Hide
          Todd Lipcon added a comment -

          +1, lgtm assuming test-patch is clean.

          Show
          Todd Lipcon added a comment - +1, lgtm assuming test-patch is clean.
          Hide
          Daryn Sharp added a comment -

          Btw that still requires catching exceptions, unlike Jetty's URIUtil all commons URIUtil encode/decode methods throw URIException, which is lame since most people want standard byte-by-byte encoding.

          Please elaborate because I'm not sure I follow your reasoning... Why would byte-for-byte be desirable? Perhaps I'm misunderstanding, but if the client uses a different local charset to encode special chars, then server-side decoding will fail. Why should this preclude the exception?

          Assuming the exception is masked, are you going to change the callers to check for null and throw an exception? Putting a literal "null" in a URI when there's an encoding problem is not acceptable. That's messy to handle in the caller, which is why I think the exception shouldn't be masked.

          Preconditions.checkArgument(request.getRequestURI().startsWith(servletName)). Otherwise the substring might chop something else off if the function is misused.

          Should this be startsWith(servletName+"/") to avoid accidently matching a servlet prefix? Ie. something like "getItem" vs "getItem(s)".

          Show
          Daryn Sharp added a comment - Btw that still requires catching exceptions, unlike Jetty's URIUtil all commons URIUtil encode/decode methods throw URIException, which is lame since most people want standard byte-by-byte encoding. Please elaborate because I'm not sure I follow your reasoning... Why would byte-for-byte be desirable? Perhaps I'm misunderstanding, but if the client uses a different local charset to encode special chars, then server-side decoding will fail. Why should this preclude the exception? Assuming the exception is masked, are you going to change the callers to check for null and throw an exception? Putting a literal "null" in a URI when there's an encoding problem is not acceptable. That's messy to handle in the caller, which is why I think the exception shouldn't be masked. Preconditions.checkArgument(request.getRequestURI().startsWith(servletName)). Otherwise the substring might chop something else off if the function is misused. Should this be startsWith(servletName+"/") to avoid accidently matching a servlet prefix? Ie. something like "getItem" vs "getItem(s)".
          Hide
          Eli Collins added a comment -

          Please elaborate because I'm not sure I follow your reasoning

          These methods don't need to throw a checked exception. For comparison, the variants of jetty URIUtil#encodePath and java's URLEncoder#encode that don't take a charset do not throw UnsupportedEncodingException. We know "UTF-8" is a supported codec, ie the exception won't be thrown because these methods, like jetty/javas, don't throw URIException when explicitly given a supported codec. And if UTF-8 were not supported we'd never reach this code. Ie we could throw an assertion error here, the catch clause here is just to work around the weakness in this particular API. I'd rather use jetty's URIUtil than have every encode/decode throw an IOException.

          Matching servletName+"/" is reasonable since our pathspecs match servletName/*, will add the +"/".

          Show
          Eli Collins added a comment - Please elaborate because I'm not sure I follow your reasoning These methods don't need to throw a checked exception. For comparison, the variants of jetty URIUtil#encodePath and java's URLEncoder#encode that don't take a charset do not throw UnsupportedEncodingException. We know "UTF-8" is a supported codec, ie the exception won't be thrown because these methods, like jetty/javas, don't throw URIException when explicitly given a supported codec. And if UTF-8 were not supported we'd never reach this code. Ie we could throw an assertion error here, the catch clause here is just to work around the weakness in this particular API. I'd rather use jetty's URIUtil than have every encode/decode throw an IOException. Matching servletName+"/" is reasonable since our pathspecs match servletName/*, will add the +"/".
          Hide
          Eli Collins added a comment -

          test-patch results, tests are in HDFS-2235.

          -1 overall.  
          
              +1 @author.  The patch does not contain any @author tags.
          
              -1 tests included.  The patch doesn't appear to include any new or modified tests.
                                  Please justify why no new tests are needed for this patch.
                                  Also please list what manual steps were performed to verify this patch.
          
              +1 javadoc.  The javadoc tool did not generate any warning messages.
          
              +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
          
              +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.
          
              +1 release audit.  The applied patch does not increase the total number of release audit warnings.
          
              +1 system test framework.  The patch passed system test framework compile.
          
          Show
          Eli Collins added a comment - test-patch results, tests are in HDFS-2235 . -1 overall. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 system test framework. The patch passed system test framework compile.
          Hide
          Daryn Sharp added a comment -

          Do you believe a reasonable compromise is to leave the catch in place, but rethrow as an IllegalArgumentException?

          Show
          Daryn Sharp added a comment - Do you believe a reasonable compromise is to leave the catch in place, but rethrow as an IllegalArgumentException ?
          Hide
          Todd Lipcon added a comment -

          maybe rethrow as AssertionError("JVM does not support UTF-8") // should never happen!

          Show
          Todd Lipcon added a comment - maybe rethrow as AssertionError("JVM does not support UTF-8") // should never happen!
          Hide
          Eli Collins added a comment -

          I've modified it to throw an assertion error.

          Show
          Eli Collins added a comment - I've modified it to throw an assertion error.
          Eli Collins made changes -
          Attachment hadoop-7531-5.patch [ 12490209 ]
          Eli Collins made changes -
          Status Open [ 1 ] In Progress [ 3 ]
          Eli Collins made changes -
          Status In Progress [ 3 ] Patch Available [ 10002 ]
          Hadoop Flags [Reviewed]
          Hide
          Eli Collins added a comment -

          I've committed this. Thanks Daryn and Todd!

          Show
          Eli Collins added a comment - I've committed this. Thanks Daryn and Todd!
          Eli Collins made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk-Commit #735 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/735/)
          HADOOP-7531. Add servlet util methods for handling paths in requests. Contributed by Eli Collins

          eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1156947
          Files :

          • /hadoop/common/trunk/hadoop-common/src/main/java/org/apache/hadoop/util/ServletUtil.java
          • /hadoop/common/trunk/hadoop-common/CHANGES.txt
          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #735 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/735/ ) HADOOP-7531 . Add servlet util methods for handling paths in requests. Contributed by Eli Collins eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1156947 Files : /hadoop/common/trunk/hadoop-common/src/main/java/org/apache/hadoop/util/ServletUtil.java /hadoop/common/trunk/hadoop-common/CHANGES.txt
          Arun C Murthy made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Gavin made changes -
          Link This issue blocks HDFS-2235 [ HDFS-2235 ]
          Gavin made changes -
          Link This issue is depended upon by HDFS-2235 [ HDFS-2235 ]
          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Open Open In Progress In Progress
          2d 7h 12m 1 Eli Collins 12/Aug/11 04:17
          In Progress In Progress Patch Available Patch Available
          6s 1 Eli Collins 12/Aug/11 04:17
          Patch Available Patch Available Resolved Resolved
          11s 1 Eli Collins 12/Aug/11 04:18
          Resolved Resolved Closed Closed
          94d 21h 32m 1 Arun C Murthy 15/Nov/11 00:50

            People

            • Assignee:
              Eli Collins
              Reporter:
              Eli Collins
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development