37072 – Encoding mismatch in error condition

Bug 37072 - Encoding mismatch in error condition

Summary: Encoding mismatch in error condition

Status:	RESOLVED DUPLICATE of bug 43236

Alias:	None

Product:	Tomcat 5
Classification:	Unclassified
Component:	Catalina (show other bugs)
Version:	5.5.9
Hardware:	Other other

Importance:	P2 normal with 4 votes (vote)
Target Milestone:	---
Assignee:	Tomcat Developers Mailing List

URL:
Keywords:

Depends on:
Blocks:

Reported:	2005-10-13 14:44 UTC by Udo Walker
Modified:	2007-12-18 15:50 UTC (History)
CC List:	4 users (show)

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Udo Walker 2005-10-13 14:44:56 UTC

I found a possible bug in the class org/apache/catalina/connector/Response.java.

I try to explain the problem with steps of the request flow (I use a filter, a
servlet and a JSP page):

1. Tomcat gets a request from outside
2. The filter gets the response and does the following:
2.1 Sets the character encoding to UTF-8
2.2 Gets the writer with response.getWriter() and writes out a message if the
content type is text/html. This locks the writer which means you can not set the
character encoding later anymore.
3 The servlet gets the request and an exception occurs (this is a simulated
exception)
4 Tomcat gets back the request and processes the error
4.1 The response is reset:
4.1.1 the response itself is reset
4.1.2 the outputstream is reset
4.1.1+2 both are reset to ISO-8859-1 which is the default value.
5 The error page is called which has the encoding UTF-8. 
5.1 BUG: The encoding of the page is not used because the writer is still locked
but the encoding in the writer is set to default which is ISO-8859-1

My suggestion is to let the character encoding be untouched in the error case
because in the error case the encoding was already set somewhere before (e.g.
filter or servlet).

1. Tomcat (ISO-8859-1, writer unlocked)
2. Filter (-> UTF-8, writer locked)
3. Servlet (UTF-8, exception raised, writer locked)
4. Tomcat (ISO-8859-1, writer locked)
5. JSP (UTF-8 <-> ISO-8859-1 conflict because writer still locked ->
setCharacterEncoding is locked)


With regards
Udo Walker

Comment 1 Yoav Shapira 2006-12-24 08:47:34 UTC

I understand your use case and your concern.  But we can't count on the encoding
being set somewhere before, can we?  Even detecting that we're in the error case
as opposed to a normal reset is somewhat challenging.  If you've got a patch you
want us to consider, please attach it to this issue.

Comment 2 Udo Walker 2007-01-02 03:18:43 UTC

In bug 36814 I first thought it is some other problem in Tomcat. Then I found
out the problem described above. 

In bug 36814 I described a possible solution with a context parameter to set the
default encoding of the container. The solution was denied :( .

I don't know how to solve the encoding problem if nobody is able to configure
the default encoding. 

You could still implement the default encoding as ISO-8859-1 but then if there
is a context parameter set then use the encoding value described there.

Comment 3 Suzuki Yuichiro 2007-02-26 00:25:30 UTC

How about the following corrections?

org.apache.catalina.connector.Response:
---
    public void reset(int status, String message) {
        reset();
        setStatus(status, message);
        usingWriter = false; // add for user error page
    }
---
This makes the user error page be able to set encoding again.
Even if there is already a generated Writer object,
I think it has not been referred any longer usually because 
the application(filter, servlet, etc.) is already over.


org.apache.catalina.valves.ErrorReportValve:
in  protected void report(Request request, Response response, Throwable 
throwable)
...
try {
    response.setContentType("text/html");
    response.setCharacterEncoding("utf-8");
    
    // add for default error page
    if(!"utf-8".equals(response.getCharacterEncoding())){
        response.getCoyoteResponse().setCharacterEncoding("utf-8");
    }
} catch (Throwable t) {
...
If the writer object is already generated, setCharacterEncoding will not work.
So I think we must force set encoding direct to coyote response.


I know the specification says setCharacterEncoding should effect only before 
getWriter,
and says nothing about getWriter in reset method description.
But we need a fix in multi byte character environment.

Comment 4 Mark Thomas 2007-03-01 20:19:34 UTC

-1 for the patch you suggest. It works for your case but won't work for many
users that don't use UTF-8.

I have started a thread on the dev list about this.

Comment 5 Yoav Shapira 2007-03-25 08:08:29 UTC

The mailing list thread is here:
http://marc.info/?l=tomcat-dev&m=117280911532391&w=2

Comment 6 Mark Thomas 2007-12-18 15:50:10 UTC

The fix in the duplicate allows the encoding of the error page to be completely
independent of the original page.

*** This bug has been marked as a duplicate of 43236 ***