Bug 35256

Summary: %2F will be decoded in PATH_INFO (Documentation to AllowEncodedSlashes says no decoding will be done)
Product: Apache httpd-2 Reporter: Daniel Koke <daniel.koke>
Component: CoreAssignee: Apache HTTPD Bugs Mailing List <bugs>
Status: RESOLVED FIXED    
Severity: normal CC: apache.org-2, armhold, barry.kaplan, chris.dent, covener, dieter.vandewalle, issues.apache.org, poirier, quigley, tanaka, thomas.stein
Priority: P2 Keywords: PatchAvailable
Version: 2.2.8   
Target Milestone: ---   
Hardware: PC   
OS: All   
URL: Behind Firewall - see description
Attachments: AllowEncodedSlashes without decoding
Patch allows letting %2f and %2c to pass unmolested in urldecode.
Patch to allow AllowEncodedChars

Description Daniel Koke 2005-06-07 17:33:49 UTC
We have a bug with the decoding of %2F in PATH_INFO.
We set "AllowEncodedSlashes On". 
We have the url: "http://172.16.0.91/VAR1=XXXX%2fVAR2=YYYY"
The Apache 2.0.47 Version makes it correct and the PATH_INFO is "VAR1=XXXX%
2fVAR2=YYYY".
The newer Version (testet 2.0.52 and 2.0.54) creates the 
PATH_INFO "VAR1=XXXX/VAR2=YYYY".
Comment 1 Thomas Stein 2005-07-14 12:59:15 UTC
Hello. 
 
Exactly the same problem here.Any progress on this issue? 
 
regards 
thomas 
 
 
Comment 2 Thomas Stein 2005-07-15 16:30:28 UTC
Created attachment 15682 [details]
AllowEncodedSlashes without decoding

Hello. 
 
After some investigations i was able to solve this issue. To be a little more 
specific, this problem only occured with a ProxyPass Directive in httpd.conf. 
After applying the attached patch the problem disappeared. 
 
best regards 
Thomas
Comment 3 Hebron Mak 2006-06-09 21:45:24 UTC
I also had this problem.  However, the new line of code specified in Thomas'
code did not fix the problem for me.  What I ended up doing is taking that line
of code and moved it a few lines higher, so that it is the first line inside the
for loop of proxy_trans(), which is line 149.  That did the trick for me.
Comment 4 Mario Lipinski 2006-07-12 14:03:49 UTC
I am also experiencing this problem but without any proxy stuff.
Comment 5 rahul 2007-09-12 03:53:42 UTC
Reproducible on Trunk. (2.3-HEAD)
ProxyPass does not have any effect on this issue.
It can be reproduced on default apache install, and having
 "AllowEncodedSlashes On"

1) Using '\' -> %5c
|(echo "GET /cgi-bin/printenv/my%5cparam HTTP/1.0\n\n" ;sleep 1) | telnet 
agneyam 8080
Trying 129.158.224.203...
Connected to agneyam.india.sun.com.
Escape character is '^]'.
HTTP/1.1 200 OK
Date: Wed, 12 Sep 2007 10:42:09 GMT
Server: Apache/2.3.0-dev (Unix)
Connection: close
Content-Type: text/plain; charset=iso-8859-1

DOCUMENT_ROOT="/space/store/httpd/htdocs"
GATEWAY_INTERFACE="CGI/1.1"
PATH="/bin:/usr/bin"
PATH_INFO="/my\param"
PATH_TRANSLATED="/space/store/httpd/htdocs/my\param"
QUERY_STRING=""
REMOTE_ADDR="129.158.224.78"
REMOTE_PORT="50617"
REQUEST_METHOD="GET"
REQUEST_URI="/cgi-bin/printenv/my%5cparam"
SCRIPT_FILENAME="/space/store/httpd/cgi-bin/printenv"
SCRIPT_NAME="/cgi-bin/printenv"
SERVER_ADDR="129.158.224.203"
SERVER_ADMIN="you@example.com"
SERVER_NAME="agneyam"
SERVER_PORT="80"
SERVER_PROTOCOL="HTTP/1.0"
SERVER_SIGNATURE=""
SERVER_SOFTWARE="Apache/2.3.0-dev (Unix)"
TZ="Asia/Calcutta"

2) Using '/' -> %2f
|(echo "GET /cgi-bin/printenv/my%2fparam HTTP/1.0\n\n" ;sleep 1) | telnet 
agneyam 8080 
Trying 129.158.224.203...
Connected to agneyam.india.sun.com.
Escape character is '^]'.
HTTP/1.1 200 OK
Date: Wed, 12 Sep 2007 10:43:38 GMT
Server: Apache/2.3.0-dev (Unix)
Connection: close
Content-Type: text/plain; charset=iso-8859-1

DOCUMENT_ROOT="/space/store/httpd/htdocs"
GATEWAY_INTERFACE="CGI/1.1"
PATH="/bin:/usr/bin"
PATH_INFO="/my/param"
PATH_TRANSLATED="/space/store/httpd/htdocs/my/param"
QUERY_STRING=""
REMOTE_ADDR="129.158.224.78"
REMOTE_PORT="59458"
REQUEST_METHOD="GET"
REQUEST_URI="/cgi-bin/printenv/my%2fparam"
SCRIPT_FILENAME="/space/store/httpd/cgi-bin/printenv"
SCRIPT_NAME="/cgi-bin/printenv"
SERVER_ADDR="129.158.224.203"
SERVER_ADMIN="you@example.com"
SERVER_NAME="agneyam"
SERVER_PORT="80"
SERVER_PROTOCOL="HTTP/1.0"
SERVER_SIGNATURE=""
SERVER_SOFTWARE="Apache/2.3.0-dev (Unix)"
TZ="Asia/Calcutta"
Comment 6 rahul 2007-09-12 06:12:47 UTC
Created attachment 20796 [details]
Patch allows letting %2f and %2c to pass unmolested in urldecode.

The docs state that:
The AllowEncodedSlashes directive allows URLs which contain encoded path
separators (%2F for /  and additionally %5C for \ on according systems) to be
used. Normally such URLs are refused with a 404 (Not found) error.

Turning AllowEncodedSlashes On is mostly useful when used in conjunction with
PATH_INFO.

Allowing encoded slashes does not imply decoding. Occurrences of %2F or %5C
(only on according systems) will be left as such in the otherwise decoded URL
string.

But the unpatched ap_unescape_url_keep2f does not behave that way. It goes
ahead and decrypts all the encoded chars found.

The patch attached checks for both %2f and %2c, and if either of the above,
lets them pass unchanged. Note that I did not use IS_SLASH to check as I do not
understand why this needs to be system dependent. Especially since the apache
may be acting as a reverse proxy whose origin server might be on a system with
a different separator.
Comment 7 Nick Kew 2007-09-13 06:54:11 UTC
*** Bug 43192 has been marked as a duplicate of this bug. ***
Comment 8 rahul 2007-09-20 01:00:02 UTC
Created attachment 20856 [details]
Patch to allow AllowEncodedChars

Patch to add AllowEncodedChars.
The AllowEncodedChars accepts multiple chars that are allowed to pass through
httpd with out being decoded. If the specified chars do not contain '/' and the

URL contains '/' encoded as %2f, then NOT_FOUND is returned. The same behavior
is
true for '\'.

(First Cut, The patch is largish, and some what ugly,
Do let me know how this can be improved.)
Comment 9 William A. Rowe Jr. 2007-09-20 01:31:47 UTC
Voting early/voting often since I will not have a chance to look at this, 
this week.  strong veto (-1)...

I've watched jk try to do the same thing and I'm getting slightly tweaked
about patches like this.  Each hole you "close" is another hole you open up.
Each encoded character has a meaning, skip a encode/decode ***or just as bad***
double an encode/decode and you pass things through to another application
which can be equally harmful.

I believe there are two sane ways to handle this, and it's never in urldecode.

One is to represent, in an extended character set, the symbolic '/' and '\'
as special-characters, while the textual (encoded) '/' and '\' become their
text representations, exactly those symbols.  Alternately, a series of '/'
elements can be represented as path_elts segment by segment, with the textual
'/' members in those patterns.

The other is to provide remappings but NEVER on a global scale; it must be
dealt with in a application by application basis.  So I would consider, for
example, a patch to handle this cleverly for PATH_INFO variables, but not a
patch which affected all modules without thought.

Rather than 'Don't decode "/"' I'd much rather see a patch 'map "/" as "%2F"',
where the escape could be to "\x2F" instead, reducing the likelyhood of opening
up new security holes where none existed before.  That map could either address
the issue of a "/" symbol or the encoded "/" text.  Only ambiguous symbols could
be allowed processing this way.
Comment 10 Barry Kaplan 2007-12-12 22:11:42 UTC
this bug exists in 2.2.6 as well
Comment 11 Chris Dent 2009-04-24 11:15:31 UTC
I'm hoping I can resurrect this bug. It is still an issue in 2.2.8 (and it appears beyond) and is a real problem when creating APIs that use PATH_INFO for identifying resources. It's basically nigh on impossible to have a resource with a name of 'foo/bar' even when you're good and escape with 'foo%2Fbar'.

If you try a PUT to /something/foo%2Fbar the handler for that PUT (say mod_wsgi or CGI) gets /something/foo/bar which is _not_ the same thing.
Comment 12 William A. Rowe Jr. 2009-09-26 09:21:21 UTC
*If* this patch is to be considered...

Unadorned '%' symbols would not be permitted.  It would be necessary to
retain to the %25 translation of all occurrences.  This would prevent users
from patterns such as %25%32%4F from tripping past the parsers and being 
rendered valid, in spite of URL rules prohibiting them.
Comment 13 William A. Rowe Jr. 2010-03-09 20:37:30 UTC
My question is; what is adding the string %2f to the token?

If the string needs to be the Literal Text, e.g. a file names foo%2fbar, that
URL is only valid if the '%' is escaped by the client.

E.g. to retrieve /foo%2fbar - the string /foo%252fbar must be passed as the
request URI.  It isn't a question of accepting '%2F' but a question of passing
the percent as an encoded literal; refer to http://tools.ietf.org/html/rfc2396
section 2.4.2;

   Because the percent "%" character always has the reserved purpose of
   being the escape indicator, it must be escaped as "%25" in order to
   be used as data within a URI.  Implementers should be careful not to
   escape or unescape the same string more than once, since unescaping
   an already unescaped string might lead to misinterpreting a percent
   data character as another escaped character, or vice versa in the
   case of escaping an already escaped string.

The reason %2f or %5C are decrypted goes to this statement;

   In some cases, data that could be represented by an unreserved
   character may appear escaped; for example, some of the unreserved
   "mark" characters are automatically escaped by some systems.  If the
   given URI scheme defines a canonicalization algorithm, then
   unreserved characters may be unescaped according to that algorithm.
   For example, "%7e" is sometimes used instead of "~" in an http URL
   path, but the two are equivalent for an http URL.

The keyword here is 'equivalent'.  httpd cannot preserve the %2F text while
allowing safe reencoding/redecoding.

If the client is failing to escape '%' that is a client flaw; please mention
what the origin of this filename pattern is.  A form submission?

We concur the documentation is entirely broken and needs to be revisited.
Comment 14 Daniel Koke 2010-03-10 08:42:20 UTC
I understand your hint to the rfc2396 but with the AllowEncodedSlashes-directive i can change that behaviour:
"Allowing encoded slashes does not imply decoding. Occurrences of %2F or %5C (only on according systems) will be left as such in the otherwise decoded URL string"
(http://httpd.apache.org/docs/2.2/en/mod/core.html#allowencodedslashes) 

e.g.
www.myurl.de/test/test.html

Now i want to add a path variable:
www.myurl.de/test/var=variable_content/test.html
-> url www.myurl.de/test/test.html is called

The variable_content will be encoded by the system. If the variable_content contains a path e.g. "foo/bar" it will be encoded to "foo%2fbar" and added to the url:
www.myurl.de/test/var=foo%2fbar/test.html
-> url www.myurl.de/test/bar/test.html is called !!!!

I interpret the directive AllowEncodedSlashes to force my wanted behaviour. The %2f should not be decoded (like the docu says) and the called url should be www.myurl.de/test/test.html


(In reply to comment #13)
> My question is; what is adding the string %2f to the token?
> If the string needs to be the Literal Text, e.g. a file names foo%2fbar, that
> URL is only valid if the '%' is escaped by the client.
> E.g. to retrieve /foo%2fbar - the string /foo%252fbar must be passed as the
> request URI.  It isn't a question of accepting '%2F' but a question of passing
> the percent as an encoded literal; refer to http://tools.ietf.org/html/rfc2396
> section 2.4.2;
>    Because the percent "%" character always has the reserved purpose of
>    being the escape indicator, it must be escaped as "%25" in order to
>    be used as data within a URI.  Implementers should be careful not to
>    escape or unescape the same string more than once, since unescaping
>    an already unescaped string might lead to misinterpreting a percent
>    data character as another escaped character, or vice versa in the
>    case of escaping an already escaped string.
> The reason %2f or %5C are decrypted goes to this statement;
>    In some cases, data that could be represented by an unreserved
>    character may appear escaped; for example, some of the unreserved
>    "mark" characters are automatically escaped by some systems.  If the
>    given URI scheme defines a canonicalization algorithm, then
>    unreserved characters may be unescaped according to that algorithm.
>    For example, "%7e" is sometimes used instead of "~" in an http URL
>    path, but the two are equivalent for an http URL.
> The keyword here is 'equivalent'.  httpd cannot preserve the %2F text while
> allowing safe reencoding/redecoding.
> If the client is failing to escape '%' that is a client flaw; please mention
> what the origin of this filename pattern is.  A form submission?
> We concur the documentation is entirely broken and needs to be revisited.
Comment 15 Timothy Ace 2010-12-28 15:33:41 UTC
My company has also run into several issues with AllowEncodedSlashes already. These issues mostly come up in cases where PATH_INFO is being used either in a resource name for a REST API or for an asset name for a video, document, news article, etc. that contains a slash in it's name. This makes us very invested in this issue. Quite honestly the current implementation is wrong and violates RFC.

Check out Example 2 from the REDUCED OR INCREASED SAFE CHARACTER SETS section of RFC 1630:

   Example 2

   The URIs

                http://info.cern.ch/albert/bertram/marie-claude

   and

                http://info.cern.ch/albert/bertram%2Fmarie-claude

   are NOT identical, as in the second case the encoded slash does not
   have hierarchical significance.


Tim specifically called out this example in RFC 1630 and it is of great importance to us for two reasons:

1. It shows concretely that having a %2F in the URL is valid. By having the default behavior of httpd to reject this request with a 404 error makes it non RFC 1630 compliant out-of-box.

2. Even it we turn on AllowEncodedSlashes, httpd interpolates the %2F as a path separator, violating RFC 1630 because it makes the two URLs in Example 2 above equivalent. ex. If "albert" is the name of the script or handler, then the PATH_INFO for both URLs will be "/bertram/marie-claude" -- which is indistinguishable from one one another, therefore making them identical.

Of note is that RFC 1630 has not been updated by or obsoleted by any other RFC and is still the basis for URLs in WWW -- something core to httpd. 

While Section 2.4.2 of RFC 2396 (section 2.4 in RFC 3986 that obsoletes RFC 2396) mentions that a tilde (~) and a %7E can be used interchanably in a URL, it is not pertenient to this issue since a tilde is not a "reserved character" (specifically called out as an "unreserved character"), yet a slash (/) is reserved.

From Section 2.2 of RFC 3986:

     reserved    = gen-delims / sub-delims

      gen-delims  = ":" / "/" / "?" / "#" / "[" / "]" / "@"

      sub-delims  = "!" / "$" / "&" / "'" / "(" / ")"
                  / "*" / "+" / "," / ";" / "="

   The purpose of reserved characters is to provide a set of delimiting
   characters that are distinguishable from other data within a URI.
   URIs that differ in the replacement of a reserved character with its
   corresponding percent-encoded octet are not equivalent.  PERCENT-
   ENCODING A RESERVED CHARACTER, OR DECODING A PERCENT-ENCODED OCTET
   THAT CORRESPONDS TO A RESERVED CHARACTER, WILL CHANGE HOW THE URI IS
   INTERPRETED BY MOST APPLICATIONS.  THUS, CHARACTERS IN THE RESERVED
   SET ARE PROTECTED FROM NORMALIZATION AND ARE THEREFORE SAFE TO BE
   USED BY SCHEME-SPECIFIC AND PRODUCER-SPECIFIC ALGORITHMS FOR
   DELIMITING DATA SUBCOMPONENTS WITHIN A URI.

I realize that it does say "most applications", however, it does go on in the next statement to say that "characters in the reserved set are protected from normalization".

Therefore the correct solution here is to change httpd to NEVER decode any of the reserved characters from the ABNF. This would follow RFC 1630 & RFC 3986 and would also make the note in the documenation for the AllowEncodedSlashes directive (http://httpd.apache.org/docs/2.2/en/mod/core.html#allowencodedslashes) correct once again in that slashes will not be decoded.

Two additional notes:

1. AllowEncodedSlashes should really be "on" by default and probably even deprecated. From what I can tell the only thing it protects against is poor application writers and does it in a less-than-graceful way by slapping up a 404. It also seems a very small percentage of people even know about the AllowEncodedSlashes and those that do end up turning it on because they found out about it because they spent a few hours scratching their head, modifying configurations and rewrite rules trying to figure out why a valid URL was being rejected.

2. Nowhere the RFCs is a backslash (\) listed as a reserved character. Therefore a %5C *should* always be decoded the same as %7E is converted to a tilde (~).
Comment 16 Dan Poirier 2011-03-17 14:47:39 UTC
Changed in trunk and next 2.2 release, hopefully in a way that will satisfy most users.  

AllowEncodedSlashes On still decodes slashes, but new option AllowEncodedSlashes NoDecode will allow the slashes and not decode them.  Doc has been updated.

trunk r1082196
2.2.x r1082630
Comment 17 Timothy Ace 2011-03-17 14:55:20 UTC
This is a good solution that maintains backwards compatibility. Thank you.
Comment 18 Daniel Koke 2011-03-18 03:10:27 UTC
Thank you
Comment 19 George Armhold 2011-06-17 18:10:38 UTC
I just tried "AllowEncodedSlashes NoDecode", and found that both 2.2.19 and 2.3.12-beta seem to *doubly* encode the slashes with this option enabled.

So if my URI is entered as "/search/-/%2Fcats/all/1-10", my httpd-fronted appserver (Glassfish 3.1) is seeing this as "/search/-/%252Fcat/all/1-10".

And BTW Glassfish has a similar issue- it needs to be told to allow encoded slashes: http://www.java.net/node/695173.  I have this enabled, and it works fine when I use Glassfish directly without Apache in front of it.

Thanks for your attention.
Comment 20 Eric Covener 2011-06-17 18:15:53 UTC
(In reply to comment #19)
> I just tried "AllowEncodedSlashes NoDecode", and found that both 2.2.19 and
> 2.3.12-beta seem to *doubly* encode the slashes with this option enabled.

This is likely mod_proxy canonicalizing the request before sending it off -- can you confirm with nocanon?
Comment 21 George Armhold 2011-06-17 18:52:11 UTC
Ah, it does indeed work with "nocanon".  Still a bug? I'm out of my depth now, and unqualified to say whether that's correct or not.  Thanks.
Comment 22 Eric Covener 2011-08-06 23:28:46 UTC
This directive doesn't influence what other modules do with URL's, re-closing.
Comment 23 Rainer Jung 2018-02-25 21:00:35 UTC
Undo spam change
Comment 24 Rainer Jung 2018-02-25 21:08:08 UTC
*** Bug 43192 has been marked as a duplicate of this bug. ***