ASF Bugzilla – Full Text Bug Listing
|Summary:||utf8 to ucs2 conversion failed on Windows|
|Product:||Apache httpd-2||Reporter:||ernesto <ernestoname>|
|Component:||mod_cgi||Assignee:||Apache HTTPD Bugs Mailing List <bugs>|
Description ernesto 2005-05-20 15:12:07 UTC
[Thu May 19 13:38:00 2005] [error] [client 10.1.5.91] (22)Invalid argument: couldn't spawn child process: C:/php/php.exe, referer: http://digei06/Seige/InformacionRelevante/Cuadros.phtml?Dependencia=Direccion% 20del%20SEIGE&url=http://seplader2.qroo.gob.mx/seige (22)Invalid argument: utf8 to ucs2 conversion failed on this string: REDIRECT_QUERY_STRING=Sector=Seguridad%20y%20Orden%20P\xfablico
Comment 1 William A. Rowe Jr. 2005-08-29 22:16:57 UTC
Dude - if you are running mod_cgid on Win32 then all bets are off :) Reclassifying. And I'm totally clueless, but I guess my first question is why use php.exe as a CGI when you can plug it in as a module, and actually serve pages without warming up your cpu? CGI is a very disk/cpu/kernel intensive way to serve any content whatsoever.
Comment 2 Richard D 2006-11-04 02:15:56 UTC
This looks like a variant of Bug 32730 which had the same issues on Windows with some different environment variables. The problem is that Apache tries to translate every environment variable from Unicode's UTF-8 encoding into UCS-2, even though the environment variable may be in another character encoding (e.g. ISO-8859-1 aka Latin-1). An extension of the fix for Bug 32730 should work, although the real solution This is not specific to mod_cgi and PHP, as it happens with non-PHP CGI programs. CGI is still a reasonable option in some cases, e.g. for development of CGI scripts on Windows for installation on Linux+CGI (or a production mod_perl server on any OS).
Comment 3 Richard D 2006-11-04 02:22:23 UTC
Got interrupted when writing last comment, sorry... To finish the incomplete sentence in that comment: the real solution in my view is to go through all environment variables that could be non-UTF8 (virtually anything that is a string) and avoid converting those - or, better, only convert those guaranteed not to be strings, or guaranteed to be ASCII only. Another environment variable with this problem is REDIRECT_URL, logged in comment to Bug 32730 after fix was committed. This is a fairly simple extension of the patch I submitted for that bug. A configuration directive to turn off this conversion might also be useful.
Comment 4 Richard D 2006-11-04 02:58:21 UTC
Some more variants of this bug... Bug 13029 is another variant for the environment variable SSL_SERVER_S_DN_L. I think the fundamental issue is that there's no way to turn off this UTF-8 to UCS-2 conversion, and it only happens on Windows, well before any CGI script or other code has a chance to do its own non-UTF-8 based conversion. The REDIRECT_QUERY_STRING variant was also reported at http://mail-archives.apache.org/mod_mbox/httpd-users/200504.mbox/%3c006901c536e0$3dd72010$5d01250a@vdm%3e
Comment 5 William A. Rowe Jr. 2006-11-04 09:46:25 UTC
Yes - it looks like this needs to be more tollerant, overall, of non-utf8 data, and I'll look at rolling in a solution that doesn't impact security. Thanks for your observations, they appear spot-on.
Comment 6 Richard D 2006-11-04 12:14:04 UTC
Not sure what you mean by security implications, but I don't think that falling back to another encoding such as ISO-8859-1 is necessary. Taking TWiki as an example, which uses paths like /bin/view/Main/WebHome, where view is the CGI script, and /Main/WebHome is the PATH_INFO (see http://twiki.org/cgi-bin/viewfile/Support/ApacheErrorsDuringEdit?rev=1.1;filename=testenv.htm for example of CGI environment variables), it would be useful to specify the following to handle non-UTF-8 encodings such as ISO-8859-1 (which are used by POST from Firefox currently): AUTH_TYPE Raw DOCUMENT_ROOT Convert GATEWAY_INTERFACE Raw HTTP_ACCEPT Raw HTTP_ACCEPT_CHARSET Raw HTTP_ACCEPT_ENCODING Raw HTTP_ACCEPT_LANGUAGE Raw HTTP_CONNECTION Raw HTTP_HOST Raw HTTP_KEEP_ALIVE Raw HTTP_USER_AGENT Raw PATH Convert (since it has pathnames) QUERY_STRING Raw (not a filename, should be interpreted by application) REMOTE_ADDR Raw REMOTE_PORT Raw REMOTE_USER Raw REQUEST_METHOD Raw REQUEST_URI Convert if valid UTF-8 (and not overlong encoding) SCRIPT_FILENAME Convert if valid UTF-8 (and not overlong encoding) SCRIPT_NAME Convert if valid UTF-8 (and not overlong encoding) SERVER_ADDR Raw SERVER_ADMIN Raw .... (rest are all raw) Basically, only those variables that correspond to filenames should be converted, and then only if they are valid UTF-8 without overlong encoding. Any variables not used by Apache should not be converted, but left to the application, or a suitable add-on Apache module for conversion. TWiki has done its own interpretation of UTF-8 URLs, independent of the OS it is running on, which is based on a technique used by IBM's web server for mainframe (z/OS) - basically it tries to recognise the URL as UTF-8 and then falls back to the native encoding (i.e. no conversion done at all). In fact we do this on the PATH_INFO ourselves. If Apache is going to carry on doing its own UTF-8 to UCS-2 conversion, which I suppose it must do in some cases that map onto a Windows filesystem (and others such as MacOS X HFS+ etc), it would be good if it recognises when data is really UTF-8 in this way. Also, it would be very helpful to have a configuration option that lets you say "don't convert variable X if it matches regex Y", e.g. don't convert PATH_INFO if it matches "/twiki/bin/.*" Some TWiki pages that might be of interest here are: http://twiki.org/cgi-bin/view/Codev/EncodeURLsWithUTF8 - how TWiki does auto-detection and conversion of UTF-8 encoding for PATH_INFO in URLs http://twiki.org/cgi-bin/view/Codev/InternationalisationUTF8 - includes material on character set auto-detection including excerpt on IBM web server approach - fortunately UTF-8 detection is much easier than the general case. http://twiki.org/cgi-bin/view/Codev/MacOSXFilesystemEncodingWithI18N - talks about a filesystem-related issue with Unicode normalisation forms on Mac OS X http://twiki.org/cgi-bin/view/Codev/ProposedUTF8SupportForI18N - general page summarising research on UTF-8 for TWiki, including some useful links
Comment 7 Preben Nilsson 2007-03-05 02:21:09 UTC
Hi all, We are implementing an application, that uses SSL client certificates. And it seems like we are running into the same problem that it descriped here: [Mon Mar 05 09:48:34 2007] [error] [client 18.104.22.168] File does not exist: C:/bec_was/servletpif/apache2/docroots/errordocs (22)Invalid argument: utf8 to ucs2 conversion failed on this string: SSL_CLIENT_S_DN_CN=Anette Birgitte Franzp\xf8tter Is there a way, that I can work around this problem ? Best regards Preben Nilsson