Bug 36814 - Parameter for POST encoding
Summary: Parameter for POST encoding
Status: RESOLVED WONTFIX
Alias: None
Product: Tomcat 5
Classification: Unclassified
Component: Catalina (show other bugs)
Version: 5.5.9
Hardware: Other other
: P2 enhancement (vote)
Target Milestone: ---
Assignee: Tomcat Developers Mailing List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-09-26 15:18 UTC by Udo Walker
Modified: 2005-09-27 07:45 UTC (History)
0 users



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Udo Walker 2005-09-26 15:18:07 UTC
There is a problem to convert the incoming request of a POST request to the
right character encoding. Mostly the browsers do not send the correct request
header so Tomcat falls back to the default encoding which is ISO-8859-1. But in
a multi language environment (e.g. mixture of english, european, chinese, etc.
languages) ISO-8859-1 is not sufficient and UTF-8 is more recommended.

To tell Tomcat to use e.g. UTF-8 as the default encoding I suggest to change the
source code of Request.java in the package org.apache.catalina.connector so that
the encoding is read from a context parameter:

/**
 * Return the character encoding for this Request.
 */
public String getCharacterEncoding() {
  String enc = coyoteRequest.getCharacterEncoding();
  if (enc == null) {
    return getDefaultPostEncoding();
  }
  return enc;
}

/**
 * Get the default encoding for parameters of requests with post method.
 */
private String getDefaultPostEncoding() {
  return context.getServletContext().getInitParameter("defaultPostEncoding");
}


To use this with UTF-8 encoding you have to add this to the <Context> of your
webapp:

<Parameter name="defaultPostEncoding" value="UTF-8"
           override="false"/>


What do you think? Is this a good enhancement for Tomcat? It would be great if
you could add this to the base code of Tomcat. Probably many Tomcat users would
be glad to have it, wouldn't they?

With regards

Udo Walker
Comment 1 Tim Funk 2005-09-26 15:25:15 UTC
The spec declares the default to be ISO-8859-1. Overriding is as easy as using
ServletRequest.setCharacterEncoding() with a  servlet filter. (Or similar)

I'm not convinced of the value of adding another attribute to the connector.
Comment 2 Allistair Crossley 2005-09-27 13:32:02 UTC
it always comes down to what the servlet specification says a container 
implementation should do, what developers expect container implementations to 
do and how they should behave, and what developers often get confused about.

character encoding seems to be one of those things that many developers are 
getting confused about, and this issue in particular seems to come about quite 
often. i suppose as far as a developer is concerned whatever data they send to 
the server should be maintained in that way. so if arabic is sent via a post 
running on tomcat, is it acceptable for the developer to have to understand 
that tomcat runs in a charset that is not universal and to have to seek a 
filter or write their own filter to change this?

personally, i believe encoding falls into the same kind of category as other 
configuration elements like databases, jndi environment vars.

imo the container should help out the developer with this type of 
functionality, if not by configuration, then by providing default configurable 
filters across web applications. 

just my opinion.
Comment 3 Udo Walker 2005-09-27 15:21:09 UTC
Maybe I am too silly to do it, but for my test case it does not work setting the
encoding with a filter. I did this:

in the web.xml as the first filter in the list:

 <filter>
    <filter-name>Set Character Encoding</filter-name>
    <filter-class>package.SetCharacterEncodingFilter</filter-class>
    <init-param>
      <param-name>encoding</param-name>
      <param-value>UTF-8</param-value>
    </init-param>
  </filter>

This is the filter which is packaged as example with Tomcat.

If I do now a post with UTF-8 characters all submitted UTF-8 characters are
corrupted. 

What am I doing wrong?


There is also a problem with valves. If you read a parameter from the request
inside a valve then the character encoding is automatically set to the default
(spec) value which is ISO-8859-1. Later you can not change this in a filter anymore.

(If I use my patch instead, everything is fine, no matter where I try to read
parameters from the request.)

With regards,
Udo
Comment 4 Allistair Crossley 2005-09-27 15:25:52 UTC
don't forget to add a filter mapping, this is ours, you may need a URL pattern 
match.

	<filter-mapping>
	 	<filter-name>requestCharacterEncodingFilter</filter-name>
	 	<servlet-name>springFrontController</servlet-name>
	</filter-mapping>

Comment 5 Udo Walker 2005-09-27 15:45:36 UTC
I also have the filter mapping like this:

 <filter-mapping>
    <filter-name>Set Character Encoding</filter-name>
    <servlet-name>action</servlet-name>
  </filter-mapping>

but it does not work.

In the config file in the <Context> I have this valve included:

    <Valve className="org.apache.catalina.valves.AccessLogValve"
           directory="logs"  prefix="localhost_access_ewi_log." suffix=".txt"
           resolveHost="false" pattern="common"/>

In the comment it says, that all request accesses are logged. Maybe this one
sets the character encoding already to the wrong value? I mean not directly the
valve but Tomcat when the request is accessed.