There is a problem to convert the incoming request of a POST request to the right character encoding. Mostly the browsers do not send the correct request header so Tomcat falls back to the default encoding which is ISO-8859-1. But in a multi language environment (e.g. mixture of english, european, chinese, etc. languages) ISO-8859-1 is not sufficient and UTF-8 is more recommended. To tell Tomcat to use e.g. UTF-8 as the default encoding I suggest to change the source code of Request.java in the package org.apache.catalina.connector so that the encoding is read from a context parameter: /** * Return the character encoding for this Request. */ public String getCharacterEncoding() { String enc = coyoteRequest.getCharacterEncoding(); if (enc == null) { return getDefaultPostEncoding(); } return enc; } /** * Get the default encoding for parameters of requests with post method. */ private String getDefaultPostEncoding() { return context.getServletContext().getInitParameter("defaultPostEncoding"); } To use this with UTF-8 encoding you have to add this to the <Context> of your webapp: <Parameter name="defaultPostEncoding" value="UTF-8" override="false"/> What do you think? Is this a good enhancement for Tomcat? It would be great if you could add this to the base code of Tomcat. Probably many Tomcat users would be glad to have it, wouldn't they? With regards Udo Walker
The spec declares the default to be ISO-8859-1. Overriding is as easy as using ServletRequest.setCharacterEncoding() with a servlet filter. (Or similar) I'm not convinced of the value of adding another attribute to the connector.
it always comes down to what the servlet specification says a container implementation should do, what developers expect container implementations to do and how they should behave, and what developers often get confused about. character encoding seems to be one of those things that many developers are getting confused about, and this issue in particular seems to come about quite often. i suppose as far as a developer is concerned whatever data they send to the server should be maintained in that way. so if arabic is sent via a post running on tomcat, is it acceptable for the developer to have to understand that tomcat runs in a charset that is not universal and to have to seek a filter or write their own filter to change this? personally, i believe encoding falls into the same kind of category as other configuration elements like databases, jndi environment vars. imo the container should help out the developer with this type of functionality, if not by configuration, then by providing default configurable filters across web applications. just my opinion.
Maybe I am too silly to do it, but for my test case it does not work setting the encoding with a filter. I did this: in the web.xml as the first filter in the list: <filter> <filter-name>Set Character Encoding</filter-name> <filter-class>package.SetCharacterEncodingFilter</filter-class> <init-param> <param-name>encoding</param-name> <param-value>UTF-8</param-value> </init-param> </filter> This is the filter which is packaged as example with Tomcat. If I do now a post with UTF-8 characters all submitted UTF-8 characters are corrupted. What am I doing wrong? There is also a problem with valves. If you read a parameter from the request inside a valve then the character encoding is automatically set to the default (spec) value which is ISO-8859-1. Later you can not change this in a filter anymore. (If I use my patch instead, everything is fine, no matter where I try to read parameters from the request.) With regards, Udo
don't forget to add a filter mapping, this is ours, you may need a URL pattern match. <filter-mapping> <filter-name>requestCharacterEncodingFilter</filter-name> <servlet-name>springFrontController</servlet-name> </filter-mapping>
I also have the filter mapping like this: <filter-mapping> <filter-name>Set Character Encoding</filter-name> <servlet-name>action</servlet-name> </filter-mapping> but it does not work. In the config file in the <Context> I have this valve included: <Valve className="org.apache.catalina.valves.AccessLogValve" directory="logs" prefix="localhost_access_ewi_log." suffix=".txt" resolveHost="false" pattern="common"/> In the comment it says, that all request accesses are logged. Maybe this one sets the character encoding already to the wrong value? I mean not directly the valve but Tomcat when the request is accessed.