Bug 32730 - Error 500 on Non-UTF-8 Encoded PATH_INFO on Windows
Summary: Error 500 on Non-UTF-8 Encoded PATH_INFO on Windows
Status: RESOLVED DUPLICATE of bug 13029
Alias: None
Product: Apache httpd-2
Classification: Unclassified
Component: Core (show other bugs)
Version: 2.0.54
Hardware: PC Windows XP
: P3 normal with 12 votes (vote)
Target Milestone: ---
Assignee: Apache HTTPD Bugs Mailing List
URL:
Keywords:
: 33055 (view as bug list)
Depends on:
Blocks:
 
Reported: 2004-12-16 12:57 UTC by Richard D
Modified: 2007-12-22 13:07 UTC (History)
2 users (show)



Attachments
Simple patch to prevent converting PATH_INFO to UCS-2 (898 bytes, patch)
2004-12-21 09:51 UTC, Richard D
Details | Diff
libhttpd.dll 2.0.53-dev svn rev 124556 (248.05 KB, application/x-apache-module)
2005-01-07 19:48 UTC, William A. Rowe Jr.
Details
Revised patch to avoid PATH_INFO and PATH_TRANSLATED conversions (926 bytes, patch)
2005-01-09 12:30 UTC, Richard D
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Richard D 2004-12-16 12:57:14 UTC
Any PATH_INFO string that contains URL-encoded bytes that are not part of a
valid UTF-8 sequence causes Apache 2.0.52 on Windows to give an Internal Server
Error 500, and put the following message in the error.log:

   (22)Invalid argument: utf8 to ucs2 conversion failed on this string:
PATH_INFO=/Main/FromageD\xe9rap\xe9

The URL that generated this was as follows ('view' being the CGI script, no
mod_perl):

   http://localhost:8080/cgi-bin/view/Main/FromageD%E9rap%E9

Bug 9223 is similar to this bug, but not a dupe - it covered the QUERY_STRING
which is mostly not used by the web application (TWiki, http://twiki.org).  As
you'd expect, the following URL works fine:

   http://localhost:8080/cgi-bin/view?topic=Main.FromageD%E9rap%E9

Most Mozilla-derived browsers including Firefox 1.0 generate URLs in the native
character encoding (e.g. ISO-8859-1) by default. In any case, Apache should not
be generating an internal server error, but a less serious error (e.g. file not
found), allowing mod_fileiri or the web application to interpret the encoding
correctly (which TWiki can do as long as it sees the PATH_INFO).

This appears to be Windows specific since TWiki has users of
internationalisation on Apache 2 and Linux - no doubt due to the Unicode on
Windows support.

I realise that such non-UTF-8 URLs are not standards conformant, but if the web
application is willing to handle them specially, I think that Apache should at
least pass them on without trying to convert them (a configuration option to
turn off this conversion would be very useful.)

This bug also prevents use of mod_fileiri, which enables such undesirable URLs
to be redirected to conformant UTF-8 encoded URLs.  As Martin Duerst has
confirmed, this runs in the Apache 'fixup' phase.

For more information and workarounds from a TWiki perspective, see
http://twiki.org/cgi-bin/view/Codev/ApacheTwoBreaksNonUTF8EncodedURLsOnWindows
Comment 1 Richard D 2004-12-16 13:08:44 UTC
Some extra information:

- this is completely reproducible, any URL using ISO-8859-1 encoded characters
in PATH_INFO will do

- the server build is from XAMPP for Windows 1.4.9 - Apache 2.0.52 -
http://www.apachefriends.org/en/xampp-windows.html

- server error page is as follows (mod_perl not active on CGI directory):

Internal Server Error

The server encountered an internal error or misconfiguration and was unable to
complete your request.

Please contact the server administrator, admin@localhost and inform them of the
time the error occurred, and anything you might have done that may have caused
the error.

More information about this error may be available in the server error log.
Apache/2.0.52 (Win32) mod_perl/1.99_16 Perl/v5.8.4 PHP/5.0.2 Server at localhost
Port 8080
Comment 2 Martin Dürst 2004-12-17 00:49:41 UTC
Apache on Windows definitely needs to give back at least a
404 rather than a 500 for something like FromageD%E9rap%E9.
This is a perfectly acceptable URI (although not one that
the server should resolve, it should continue to only resolve
with UTF-8), and so having the server blow up with a 500
is totally inappropriate. Just check for UTF-8 before
doing the conversion or catch the error, and things
should be fine.
(if anybody tells me where in the code the conversion
function is called, I'll try to help come up with a patch)
Comment 3 Richard D 2004-12-20 11:08:28 UTC
The Firefox 1.0 bug at http://bugzilla.mozilla.org/show_bug.cgi?id=261934 means
that users can't persistently set Firefox to do UTF-8 encoding of URLs - there
is a fix in place for future releases but no workaround for 1.0.  So the
'configure Firefox' workaround is not very convenient for users. 
Comment 4 Richard D 2004-12-20 12:04:49 UTC
I've had a look at the 2.0.52 code and the immediate issue appears to be in
srclib/apr/threadproc/win32/proc.c at lines 473 to 502, or line 480 onwards in
CVS (see
http://lxr.webperf.org/source.cgi/srclib/apr/threadproc/win32/proc.c#480 ) -
this is getting ready to create a Unicode environment block via the
CreateProcess API in Win32.  It is based on a compile time option,
APR_HAS_UNICODE_FS.

An immediate patch may be quite easy, using the APR_HAS_ANSI_FS code to build an
ANSI environment block - or perhaps just recompiling with APR_HAS_ANSI_FS.

A better fix might be to allow APR_HAS_ANSI_FS vs APR_HAS_UNICODE_FS to be
selected through a run-time configuration directive.  This would enable
sites/applications that want environment variables to be completely untouched by
the UTF8 to UCS2 conversion to run in 'no environment conversion' mode - the web
application can then do its own conversion without having to catch 404 errors. 

Since there's no guarantee the environment passed to Apache will really be
UTF-8, it seems that this 'no conversion' mode is important for some
applications at least.  Many web applications are portable between *nix and
Windows, and have their own UTF8 URI handling code that works fine on both, so
it's a pain if they have special code to handle the way Apache does things
differently on Windows.

If the 'no environment conversion' option is too sweeping, all that's needed is
to add PATH_INFO to the list of environment variables not converted (as in Bug
9223) (Not sure where that code is, can someone point me to it?).  However, I
think that will cause future issues with other environment variables that may
even be application-specific. 

Re-assigned to Will Rowe based on similarity to Bug 9223, hope that's OK.
Comment 5 André Malo 2004-12-20 12:19:00 UTC
No, that's no ok. Then he's the only one getting mail on updates. I'm adding him
to CC instead.

Anyway I'm wondering, what the FS type has to do with environment conversion.
(But I have no much win32 fu).
Comment 6 Richard D 2004-12-21 09:51:56 UTC
Created attachment 13812 [details]
Simple patch to prevent converting PATH_INFO to UCS-2

This is a very simple patch to modules/arch/win32/mod_win32.c that should
prevent the PATH_INFO environment variable being converted from UTF-8 to UCS-2,
as with QUERY_STRING (Bug 9223).  I have not been able to test this yet - I
downloaded various free Microsoft compilers and am almost ready, but missing
MSDEV (any pointers to free versions, please email me).

I'm on vacation until near end of the year, but if someone with a Windows build
environment can test it with the above URL (omit the bin/view part if
necessary) that would be great.

I still think a wider patch would be better but this may fix the immediate
issue.
Comment 7 Francis Lee 2005-01-06 13:21:34 UTC
Hi,
I'm not using Twiki and I've run into the same problems:

When using PHP's URL encoding function to access files with international
characters a 404 file not found is returned. When first UTF-8 encoding the url
and then URL encoding it - it works fine. Note the encoding differences in the
URL: R%EAve and R%C3%AAve.

Examples from log file:

xxx.xxx.xxx.xxx - - [01/Jan/2005:18:23:03 +0100] "GET
/Idir/Deux%20Rives%2C%20un%20R%EAve/01%20-%20Pourquoi%20cette%20pluie%20%20.mp3
HTTP/1.0" 404 260 "-" "WinampMPEG/2.9" "-"

xxx.xxx.xxx.xxx - - [01/Jan/2005:21:12:16 +0100] "GET
/Idir/Deux%20Rives%2C%20un%20R%C3%AAve/01%20-%20Pourquoi%20cette%20pluie%20%20.mp3
HTTP/1.0" 200 8164000 "-" "WinampMPEG/5.0" "-" 
Comment 8 Richard D 2005-01-07 14:49:50 UTC
Can someone with an Apache for Windows development environment please consider
testing my one-line patch (see attachment)?   The test case is very simple so
this should not take too long...

I can't justify paying for a copy of Visual Studio just to get this tested, and
without MSDEV the free version of Visual C++ doesn't work for Apache builds. 

Alternatively, if anyone has pointers to using Cygwin + mingw to do a Win32
build of Apache, that would be great, as I'm quite familiar with Cygwin already
and it won't cost anything to test this.
Comment 9 William A. Rowe Jr. 2005-01-07 17:46:06 UTC
This is a sane proposal.  I'll attach a replacement libhttpd.dll (2.0.53-dev)
to the incident once I have a chance to commit the fix to 2.1-dev.
Comment 10 William A. Rowe Jr. 2005-01-07 17:48:53 UTC
Francis, "When using PHP's URL encoding function to access files"
is unrelated to this incident (presuming you are using mod_php4
al la php4apache2handler module.)
Comment 11 William A. Rowe Jr. 2005-01-07 19:48:59 UTC
Created attachment 13929 [details]
libhttpd.dll 2.0.53-dev svn rev 124556

To test this patch, replace in the httpd-2.0.44 or later httpd-2.0
Apache/bin directory.
Comment 12 Francis Lee 2005-01-08 11:01:12 UTC
Hi,
Maybe this comment does not belong here, but I'll try to explain more clearly. 
I used PHP to read the contents of a directory and to output the URL. When URL 
encoding the result there was a resulting 404 not found. When UTF8 encoding 
and then URL encoding the file was found. 

It might not be related - but then again - it might be.
Comment 13 Richard D 2005-01-09 09:41:41 UTC
I tried the new libhttpd.dll under my Apache setup and I got a somewhat
different conversion error:

(22)I(22)Invalid argument: utf8 to ucs2 conversion failed on this string:
PATH_TRANSLATED=C:\\apachefriends\\xampp\\htdocs\\Main\\FromageD\xe9rap\xe9

This is happening later in the process as you can see, on the file not the URL
as in the original bug.   The URL used was
http://localhost:8080/cgi-bin/view/Main/FromageD%E9rap%E9 as in the bug report.

I know this is not a server or application config issue because
http://localhost:8080/cgi-bin/view/Main/WebHome works fine even though
C:\\apachefriends\\xampp\\htdocs\\Main\\WebHome does not exist as a file either.
 So it seems there is a file-existence check somewhere that is also requiring a
translation of PATH_INFO even though the file doesn't have to exist. 
Comment 14 Richard D 2005-01-09 12:30:03 UTC
Created attachment 13952 [details]
Revised patch to avoid PATH_INFO and PATH_TRANSLATED conversions

New patch that should work better...  The analysis I submitted earlier today
was wrong - it is the same issue as before with slightly different error
message.  This revised patch adds PATH_TRANSLATED to the list of environment
variables not converted to UCS2 (note that REQUEST_URI also includes PATH_INFO
data but is already covered by current code).
Comment 15 William A. Rowe Jr. 2005-02-05 02:46:03 UTC
Thanks Richard, the patch is now in the httpd-2.1 tree and should soon be
backported (provided we get the votes) to 2.0.
Comment 16 Joe Orton 2005-03-29 17:46:50 UTC
*** Bug 33055 has been marked as a duplicate of this bug. ***
Comment 17 Roger H 2005-05-16 04:09:08 UTC
This bug also occurs with REDIRECT_URL, which seem to have gone unnoticed in the
2.0.54 release!
Comment 18 Richard D 2006-11-04 02:27:21 UTC
Bug 34985 is another variant of this, have commented there.
Comment 19 William A. Rowe Jr. 2007-12-22 13:07:29 UTC

*** This bug has been marked as a duplicate of 13029 ***