We have a bug with the decoding of %2F in PATH_INFO. We set "AllowEncodedSlashes On". We have the url: "http://172.16.0.91/VAR1=XXXX%2fVAR2=YYYY" The Apache 2.0.47 Version makes it correct and the PATH_INFO is "VAR1=XXXX% 2fVAR2=YYYY". The newer Version (testet 2.0.52 and 2.0.54) creates the PATH_INFO "VAR1=XXXX/VAR2=YYYY".
Hello. Exactly the same problem here.Any progress on this issue? regards thomas
Created attachment 15682 [details] AllowEncodedSlashes without decoding Hello. After some investigations i was able to solve this issue. To be a little more specific, this problem only occured with a ProxyPass Directive in httpd.conf. After applying the attached patch the problem disappeared. best regards Thomas
I also had this problem. However, the new line of code specified in Thomas' code did not fix the problem for me. What I ended up doing is taking that line of code and moved it a few lines higher, so that it is the first line inside the for loop of proxy_trans(), which is line 149. That did the trick for me.
I am also experiencing this problem but without any proxy stuff.
Reproducible on Trunk. (2.3-HEAD) ProxyPass does not have any effect on this issue. It can be reproduced on default apache install, and having "AllowEncodedSlashes On" 1) Using '\' -> %5c |(echo "GET /cgi-bin/printenv/my%5cparam HTTP/1.0\n\n" ;sleep 1) | telnet agneyam 8080 Trying 129.158.224.203... Connected to agneyam.india.sun.com. Escape character is '^]'. HTTP/1.1 200 OK Date: Wed, 12 Sep 2007 10:42:09 GMT Server: Apache/2.3.0-dev (Unix) Connection: close Content-Type: text/plain; charset=iso-8859-1 DOCUMENT_ROOT="/space/store/httpd/htdocs" GATEWAY_INTERFACE="CGI/1.1" PATH="/bin:/usr/bin" PATH_INFO="/my\param" PATH_TRANSLATED="/space/store/httpd/htdocs/my\param" QUERY_STRING="" REMOTE_ADDR="129.158.224.78" REMOTE_PORT="50617" REQUEST_METHOD="GET" REQUEST_URI="/cgi-bin/printenv/my%5cparam" SCRIPT_FILENAME="/space/store/httpd/cgi-bin/printenv" SCRIPT_NAME="/cgi-bin/printenv" SERVER_ADDR="129.158.224.203" SERVER_ADMIN="you@example.com" SERVER_NAME="agneyam" SERVER_PORT="80" SERVER_PROTOCOL="HTTP/1.0" SERVER_SIGNATURE="" SERVER_SOFTWARE="Apache/2.3.0-dev (Unix)" TZ="Asia/Calcutta" 2) Using '/' -> %2f |(echo "GET /cgi-bin/printenv/my%2fparam HTTP/1.0\n\n" ;sleep 1) | telnet agneyam 8080 Trying 129.158.224.203... Connected to agneyam.india.sun.com. Escape character is '^]'. HTTP/1.1 200 OK Date: Wed, 12 Sep 2007 10:43:38 GMT Server: Apache/2.3.0-dev (Unix) Connection: close Content-Type: text/plain; charset=iso-8859-1 DOCUMENT_ROOT="/space/store/httpd/htdocs" GATEWAY_INTERFACE="CGI/1.1" PATH="/bin:/usr/bin" PATH_INFO="/my/param" PATH_TRANSLATED="/space/store/httpd/htdocs/my/param" QUERY_STRING="" REMOTE_ADDR="129.158.224.78" REMOTE_PORT="59458" REQUEST_METHOD="GET" REQUEST_URI="/cgi-bin/printenv/my%2fparam" SCRIPT_FILENAME="/space/store/httpd/cgi-bin/printenv" SCRIPT_NAME="/cgi-bin/printenv" SERVER_ADDR="129.158.224.203" SERVER_ADMIN="you@example.com" SERVER_NAME="agneyam" SERVER_PORT="80" SERVER_PROTOCOL="HTTP/1.0" SERVER_SIGNATURE="" SERVER_SOFTWARE="Apache/2.3.0-dev (Unix)" TZ="Asia/Calcutta"
Created attachment 20796 [details] Patch allows letting %2f and %2c to pass unmolested in urldecode. The docs state that: The AllowEncodedSlashes directive allows URLs which contain encoded path separators (%2F for / and additionally %5C for \ on according systems) to be used. Normally such URLs are refused with a 404 (Not found) error. Turning AllowEncodedSlashes On is mostly useful when used in conjunction with PATH_INFO. Allowing encoded slashes does not imply decoding. Occurrences of %2F or %5C (only on according systems) will be left as such in the otherwise decoded URL string. But the unpatched ap_unescape_url_keep2f does not behave that way. It goes ahead and decrypts all the encoded chars found. The patch attached checks for both %2f and %2c, and if either of the above, lets them pass unchanged. Note that I did not use IS_SLASH to check as I do not understand why this needs to be system dependent. Especially since the apache may be acting as a reverse proxy whose origin server might be on a system with a different separator.
*** Bug 43192 has been marked as a duplicate of this bug. ***
Created attachment 20856 [details] Patch to allow AllowEncodedChars Patch to add AllowEncodedChars. The AllowEncodedChars accepts multiple chars that are allowed to pass through httpd with out being decoded. If the specified chars do not contain '/' and the URL contains '/' encoded as %2f, then NOT_FOUND is returned. The same behavior is true for '\'. (First Cut, The patch is largish, and some what ugly, Do let me know how this can be improved.)
Voting early/voting often since I will not have a chance to look at this, this week. strong veto (-1)... I've watched jk try to do the same thing and I'm getting slightly tweaked about patches like this. Each hole you "close" is another hole you open up. Each encoded character has a meaning, skip a encode/decode ***or just as bad*** double an encode/decode and you pass things through to another application which can be equally harmful. I believe there are two sane ways to handle this, and it's never in urldecode. One is to represent, in an extended character set, the symbolic '/' and '\' as special-characters, while the textual (encoded) '/' and '\' become their text representations, exactly those symbols. Alternately, a series of '/' elements can be represented as path_elts segment by segment, with the textual '/' members in those patterns. The other is to provide remappings but NEVER on a global scale; it must be dealt with in a application by application basis. So I would consider, for example, a patch to handle this cleverly for PATH_INFO variables, but not a patch which affected all modules without thought. Rather than 'Don't decode "/"' I'd much rather see a patch 'map "/" as "%2F"', where the escape could be to "\x2F" instead, reducing the likelyhood of opening up new security holes where none existed before. That map could either address the issue of a "/" symbol or the encoded "/" text. Only ambiguous symbols could be allowed processing this way.
this bug exists in 2.2.6 as well
I'm hoping I can resurrect this bug. It is still an issue in 2.2.8 (and it appears beyond) and is a real problem when creating APIs that use PATH_INFO for identifying resources. It's basically nigh on impossible to have a resource with a name of 'foo/bar' even when you're good and escape with 'foo%2Fbar'. If you try a PUT to /something/foo%2Fbar the handler for that PUT (say mod_wsgi or CGI) gets /something/foo/bar which is _not_ the same thing.
*If* this patch is to be considered... Unadorned '%' symbols would not be permitted. It would be necessary to retain to the %25 translation of all occurrences. This would prevent users from patterns such as %25%32%4F from tripping past the parsers and being rendered valid, in spite of URL rules prohibiting them.
My question is; what is adding the string %2f to the token? If the string needs to be the Literal Text, e.g. a file names foo%2fbar, that URL is only valid if the '%' is escaped by the client. E.g. to retrieve /foo%2fbar - the string /foo%252fbar must be passed as the request URI. It isn't a question of accepting '%2F' but a question of passing the percent as an encoded literal; refer to http://tools.ietf.org/html/rfc2396 section 2.4.2; Because the percent "%" character always has the reserved purpose of being the escape indicator, it must be escaped as "%25" in order to be used as data within a URI. Implementers should be careful not to escape or unescape the same string more than once, since unescaping an already unescaped string might lead to misinterpreting a percent data character as another escaped character, or vice versa in the case of escaping an already escaped string. The reason %2f or %5C are decrypted goes to this statement; In some cases, data that could be represented by an unreserved character may appear escaped; for example, some of the unreserved "mark" characters are automatically escaped by some systems. If the given URI scheme defines a canonicalization algorithm, then unreserved characters may be unescaped according to that algorithm. For example, "%7e" is sometimes used instead of "~" in an http URL path, but the two are equivalent for an http URL. The keyword here is 'equivalent'. httpd cannot preserve the %2F text while allowing safe reencoding/redecoding. If the client is failing to escape '%' that is a client flaw; please mention what the origin of this filename pattern is. A form submission? We concur the documentation is entirely broken and needs to be revisited.
I understand your hint to the rfc2396 but with the AllowEncodedSlashes-directive i can change that behaviour: "Allowing encoded slashes does not imply decoding. Occurrences of %2F or %5C (only on according systems) will be left as such in the otherwise decoded URL string" (http://httpd.apache.org/docs/2.2/en/mod/core.html#allowencodedslashes) e.g. www.myurl.de/test/test.html Now i want to add a path variable: www.myurl.de/test/var=variable_content/test.html -> url www.myurl.de/test/test.html is called The variable_content will be encoded by the system. If the variable_content contains a path e.g. "foo/bar" it will be encoded to "foo%2fbar" and added to the url: www.myurl.de/test/var=foo%2fbar/test.html -> url www.myurl.de/test/bar/test.html is called !!!! I interpret the directive AllowEncodedSlashes to force my wanted behaviour. The %2f should not be decoded (like the docu says) and the called url should be www.myurl.de/test/test.html (In reply to comment #13) > My question is; what is adding the string %2f to the token? > If the string needs to be the Literal Text, e.g. a file names foo%2fbar, that > URL is only valid if the '%' is escaped by the client. > E.g. to retrieve /foo%2fbar - the string /foo%252fbar must be passed as the > request URI. It isn't a question of accepting '%2F' but a question of passing > the percent as an encoded literal; refer to http://tools.ietf.org/html/rfc2396 > section 2.4.2; > Because the percent "%" character always has the reserved purpose of > being the escape indicator, it must be escaped as "%25" in order to > be used as data within a URI. Implementers should be careful not to > escape or unescape the same string more than once, since unescaping > an already unescaped string might lead to misinterpreting a percent > data character as another escaped character, or vice versa in the > case of escaping an already escaped string. > The reason %2f or %5C are decrypted goes to this statement; > In some cases, data that could be represented by an unreserved > character may appear escaped; for example, some of the unreserved > "mark" characters are automatically escaped by some systems. If the > given URI scheme defines a canonicalization algorithm, then > unreserved characters may be unescaped according to that algorithm. > For example, "%7e" is sometimes used instead of "~" in an http URL > path, but the two are equivalent for an http URL. > The keyword here is 'equivalent'. httpd cannot preserve the %2F text while > allowing safe reencoding/redecoding. > If the client is failing to escape '%' that is a client flaw; please mention > what the origin of this filename pattern is. A form submission? > We concur the documentation is entirely broken and needs to be revisited.
My company has also run into several issues with AllowEncodedSlashes already. These issues mostly come up in cases where PATH_INFO is being used either in a resource name for a REST API or for an asset name for a video, document, news article, etc. that contains a slash in it's name. This makes us very invested in this issue. Quite honestly the current implementation is wrong and violates RFC. Check out Example 2 from the REDUCED OR INCREASED SAFE CHARACTER SETS section of RFC 1630: Example 2 The URIs http://info.cern.ch/albert/bertram/marie-claude and http://info.cern.ch/albert/bertram%2Fmarie-claude are NOT identical, as in the second case the encoded slash does not have hierarchical significance. Tim specifically called out this example in RFC 1630 and it is of great importance to us for two reasons: 1. It shows concretely that having a %2F in the URL is valid. By having the default behavior of httpd to reject this request with a 404 error makes it non RFC 1630 compliant out-of-box. 2. Even it we turn on AllowEncodedSlashes, httpd interpolates the %2F as a path separator, violating RFC 1630 because it makes the two URLs in Example 2 above equivalent. ex. If "albert" is the name of the script or handler, then the PATH_INFO for both URLs will be "/bertram/marie-claude" -- which is indistinguishable from one one another, therefore making them identical. Of note is that RFC 1630 has not been updated by or obsoleted by any other RFC and is still the basis for URLs in WWW -- something core to httpd. While Section 2.4.2 of RFC 2396 (section 2.4 in RFC 3986 that obsoletes RFC 2396) mentions that a tilde (~) and a %7E can be used interchanably in a URL, it is not pertenient to this issue since a tilde is not a "reserved character" (specifically called out as an "unreserved character"), yet a slash (/) is reserved. From Section 2.2 of RFC 3986: reserved = gen-delims / sub-delims gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@" sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "=" The purpose of reserved characters is to provide a set of delimiting characters that are distinguishable from other data within a URI. URIs that differ in the replacement of a reserved character with its corresponding percent-encoded octet are not equivalent. PERCENT- ENCODING A RESERVED CHARACTER, OR DECODING A PERCENT-ENCODED OCTET THAT CORRESPONDS TO A RESERVED CHARACTER, WILL CHANGE HOW THE URI IS INTERPRETED BY MOST APPLICATIONS. THUS, CHARACTERS IN THE RESERVED SET ARE PROTECTED FROM NORMALIZATION AND ARE THEREFORE SAFE TO BE USED BY SCHEME-SPECIFIC AND PRODUCER-SPECIFIC ALGORITHMS FOR DELIMITING DATA SUBCOMPONENTS WITHIN A URI. I realize that it does say "most applications", however, it does go on in the next statement to say that "characters in the reserved set are protected from normalization". Therefore the correct solution here is to change httpd to NEVER decode any of the reserved characters from the ABNF. This would follow RFC 1630 & RFC 3986 and would also make the note in the documenation for the AllowEncodedSlashes directive (http://httpd.apache.org/docs/2.2/en/mod/core.html#allowencodedslashes) correct once again in that slashes will not be decoded. Two additional notes: 1. AllowEncodedSlashes should really be "on" by default and probably even deprecated. From what I can tell the only thing it protects against is poor application writers and does it in a less-than-graceful way by slapping up a 404. It also seems a very small percentage of people even know about the AllowEncodedSlashes and those that do end up turning it on because they found out about it because they spent a few hours scratching their head, modifying configurations and rewrite rules trying to figure out why a valid URL was being rejected. 2. Nowhere the RFCs is a backslash (\) listed as a reserved character. Therefore a %5C *should* always be decoded the same as %7E is converted to a tilde (~).
Changed in trunk and next 2.2 release, hopefully in a way that will satisfy most users. AllowEncodedSlashes On still decodes slashes, but new option AllowEncodedSlashes NoDecode will allow the slashes and not decode them. Doc has been updated. trunk r1082196 2.2.x r1082630
This is a good solution that maintains backwards compatibility. Thank you.
Thank you
I just tried "AllowEncodedSlashes NoDecode", and found that both 2.2.19 and 2.3.12-beta seem to *doubly* encode the slashes with this option enabled. So if my URI is entered as "/search/-/%2Fcats/all/1-10", my httpd-fronted appserver (Glassfish 3.1) is seeing this as "/search/-/%252Fcat/all/1-10". And BTW Glassfish has a similar issue- it needs to be told to allow encoded slashes: http://www.java.net/node/695173. I have this enabled, and it works fine when I use Glassfish directly without Apache in front of it. Thanks for your attention.
(In reply to comment #19) > I just tried "AllowEncodedSlashes NoDecode", and found that both 2.2.19 and > 2.3.12-beta seem to *doubly* encode the slashes with this option enabled. This is likely mod_proxy canonicalizing the request before sending it off -- can you confirm with nocanon?
Ah, it does indeed work with "nocanon". Still a bug? I'm out of my depth now, and unqualified to say whether that's correct or not. Thanks.
This directive doesn't influence what other modules do with URL's, re-closing.
Undo spam change