Bug 24345

Summary: request setCharacterEncoding has no affect
Product: Tomcat 5 Reporter: joachim zhang <joachimz>
Component: CatalinaAssignee: Tomcat Developers Mailing List <dev>
Status: RESOLVED INVALID    
Severity: critical CC: alex
Priority: P3    
Version: 5.0.14   
Target Milestone: ---   
Hardware: PC   
OS: All   

Description joachim zhang 2003-11-03 10:06:17 UTC
after update to tomcat5.0.14Beta, old page occurs some encoding issues, it 
seemed that request.setCharacterEncoding(String enc) doesn't work! Since 
request.getCharacterEncoding() is return the correct encoding that I've set
(GBK: a chinese encoding), but the String get by request.getParameter
("field_name") is still iso-8859-1! (ONLY after do like 

String newString = new String(request.getParameter("field_name")).getBytes("iso-
8859-1"), "GBK");

could get corrent String).

Following is my test JSP source:


<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML>
<%@ page
	language = "java"
	session = "false"
	contentType = "text/html; charset=GBK" 
%><%request.setCharacterEncoding("GBK");%>
<HEAD>
<TITLE>Test page</TITLE>
<META HTTP-EQUIV="Content-Type" content="text/html; charset=GBK">

<META NAME="Author" CONTENT="Joachim">
<META NAME="Keywords" CONTENT="">
</HEAD>

<BODY BGCOLOR="#FFFFFF">
<FORM METHOD=POST ACTION="">
<TEXTAREA NAME="text" ROWS="6" COLS="60" wrap="off">
	<%=new String(cl(request.getParameter("text")).getBytes("iso-8859-
1"), "GBK")%>
</TEXTAREA>

<BR/><INPUT TYPE="submit" value="Submit">
</FORM>
<%=request.getCharacterEncoding()%>
</BODY>
</HTML>
<%!
	String cl(String v)
	{
		return (null == v) ? "" : v;
	}

%>
Comment 1 Remy Maucherat 2003-11-03 11:34:47 UTC
URI parameters encoding isn't handled with that. See the URI encoding parameter.
However, URI encoding can't be ade to work reliably in all cases, due to the
absence of a standard. If you want i18n, use POST. Please do not reopen the report.
Comment 2 Stefanos Karasavvidis 2003-11-03 13:21:18 UTC
Remy,

the example submited with the bug report should produce the result expected by
the reporter without having to use the getBytes() workaround (and it does use
POST). 

I cannot verify this as I don't have a tomcat 5 installation, but marking this
as invalid for the reasons mentioned is just wrong.
Comment 3 Remy Maucherat 2003-11-03 14:50:43 UTC
I did look at it (it does indeed POST), and added traces into the
o.a.tomcat.util.http.Parameters class, and (unsurprisingly) the correct encoding
name is being used for character decoding (the input being a byte array). So the
example should work fine.
Comment 4 joachim zhang 2003-11-04 00:50:58 UTC
I find the real reason: It's my fault, I use a filter(atlassian 
ProfilingFilter) which is called before Servlet service method and invoke 
request.getParameter() before!
Thanks all, I will carefully test and verifiy before report!
Comment 5 Alex Khokhlov 2003-11-15 09:52:53 UTC
Sorry for bothering you again, but I'm not completely with you... Though there 
is no standard for the URI encodings, there is a servlet specification (see 
2.4pfd3, chapter SRV.4.9).

Also, there are many cases when developer is not the one who will impose POST 
requests to his application - he is forced to get parameters from URI. So, this 
method SHOULD definitely work as before...

I have prepared a 'clean' test file 'bug.jsp' which can be dropped either in 
Tomcat 5.0.9 or 5.0.14 to see the difference between these two points of 
developmnent (this file contains some russian text in Cp1251 as a sample). 
5.0.9 works fine whereas 5.0.14 does not.

<%@ page 
  pageEncoding="Cp1251" 
  language="java"
  contentType="text/html; charset=utf-8"
%>    
<html>
<head>
  <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>

<%
  request.setCharacterEncoding("utf-8"); 
  String test = request.getParameter("test");
  if (test != null) {
    out.write("the length of the test value after decoding: " + test.length());
  }
  out.write("<br>");
  out.write("the value of the test parameter: " + test);
  out.write("<br>");
%>

<form action="bug.jsp" method="get">
  <input type="text" name="test" value="ั‚ะตัั‚">
  <input type="submit" value="test this russian text (4 characters)!">
</form>

</body>
</html>

What do you think?
Comment 6 Remy Maucherat 2003-11-15 10:14:33 UTC
The previous behavior was breaking the HTTP spec. Of course, since you were
using UTF-8, you were basically in the only situation that could work.
You can use the URIEncoding attribute on the Connector to specify the URI
encoding (so set it to UTF-8).