Bug 23929 - request.setCharacterEncoding(String) doesn't work
Summary: request.setCharacterEncoding(String) doesn't work
Status: RESOLVED INVALID
Alias: None
Product: Tomcat 5
Classification: Unclassified
Component: Servlet & JSP API (show other bugs)
Version: 5.0.12
Hardware: PC Linux
: P3 normal (vote)
Target Milestone: ---
Assignee: Tomcat Developers Mailing List
URL:
Keywords:
: 25848 25958 26118 26393 30255 (view as bug list)
Depends on:
Blocks:
 
Reported: 2003-10-20 09:10 UTC by Michal Krause
Modified: 2004-11-16 19:05 UTC (History)
5 users (show)



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Michal Krause 2003-10-20 09:10:27 UTC
I use following construction to set request character encoding:

if (request.getCharacterEncoding() == null) {
        request.setCharacterEncoding("ISO8859-2");
}

In older versions of Tomcat (5.0.7 tested) everything works fine but in version
5.0.12 this construction changes nothing - ISO8859-2 characters in parameter
values are replaced with "?".

$ java -version
java version "1.4.2"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2-b28)
Java HotSpot(TM) Client VM (build 1.4.2-b28, mixed mode)
Comment 1 Remy Maucherat 2003-10-20 09:58:28 UTC
I doubt there's actually a bug with this functionality. The query string
character encoding handling did change, but it was previously broken.
If you want i18n in a portable fashion, use a POST.
Comment 2 Michal Krause 2003-10-20 10:30:49 UTC
Sorry, but I don't understand. Is there a bug in this or not? If it is, why did
you mark this report as invalid? If not, what I have to do to make it work in my
configuration?

I don't want to use POST, because HTTP RFC says when to use GET and when POST
and I'm convinced that in my situation GET is better option and should be used.
Comment 3 Remy Maucherat 2003-10-20 10:37:55 UTC
Sorry, there's no bug. BZ is not there to discuss design decisions. If you want
to do so, post on tomcat-dev. The only standard for URL encoding is to use
UTF-8, but nobody follows the standard. You can also now configure the URI
encoding in the connector. If you insist on using i18n with URL parameters, the
result is that it won't work reliably, but of course, you're free to do what you
want ;-)
Please do not reopen the report.
Comment 4 Remy Maucherat 2004-01-03 09:16:51 UTC
*** Bug 25848 has been marked as a duplicate of this bug. ***
Comment 5 Remy Maucherat 2004-01-07 16:13:17 UTC
*** Bug 25958 has been marked as a duplicate of this bug. ***
Comment 6 Remy Maucherat 2004-01-14 13:03:35 UTC
From Mark:

Character encoding has been the source of quite a bit of debate on the tomcat-
dev list in recent weeks. There have been a few changes (see summary below) as 
a result. Essentially some additional configuration options have been 
provided. The UTF-8 issue (also reported in bug 22666) has also been fixed.

Character encoding summary
==========================

There are a number of situations where there may be a requirement to use non-
US ASCII characters in a URI. These include:
- Parameters in the query string
- Servlet paths

There is a standard for encoding URIs (http://www.w3.org/International/O-URL-
code.html) but this standard is not consistently followed by clients. This 
causes a number of problems.

The functionality provided by Tomcat (4 and 5) to handle this less than ideal 
situation is described below.

1. The Coyote HTTP/1.1 connector has a useBodyEncodingForURI attribute which 
if set to true will use the request body encoding to decode the URI query 
parameters.
  - The default value is true for TC4 (breaks spec but gives consistent 
behaviour across TC4 versions)
  - The default value is false for TC5 (spec compliant but there may be 
migration issues for some apps)
2. The Coyote HTTP/1.1 connector has a URIEncoding attribute which defaults to 
ISO-8859-1.
3. The parameters class (o.a.t.u.http.Parameters) has a QueryStringEncoding 
field which defaults to the URIEncoding. It must be set before the parameters 
are parsed to have an effect.

Things to note regarding the servlet API:
1. HttpServletRequest.setCharacterEncoding() normally only applies to the 
request body NOT the URI.
2. HttpServletRequest.getPathInfo() is decoded by the web container.
3. HttpServletRequest.getRequestURI() is not decoded by container.

Other tips:
1. Use POST with forms to return parameters as the parameters are then part of 
the request body.
Comment 7 Remy Maucherat 2004-01-14 13:05:08 UTC
*** Bug 26118 has been marked as a duplicate of this bug. ***
Comment 8 Remy Maucherat 2004-01-24 08:05:43 UTC
*** Bug 26393 has been marked as a duplicate of this bug. ***
Comment 9 Kin-Man Chung 2004-07-27 20:41:37 UTC
*** Bug 30255 has been marked as a duplicate of this bug. ***