Issue Details (XML | Word | Printable)

Key: TAPESTRY-607
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: Howard M. Lewis Ship
Reporter: Jone
Votes: 2
Watchers: 2
Operations

If you were logged in you would be able to see more operations.
Tapestry

Output encoding problem with some versions of Tomcat 5

Created: 29/Aug/05 04:04 PM   Updated: 13/Feb/07 12:25 PM
Return to search
Component/s: Framework
Affects Version/s: 4.0
Fix Version/s: 4.0

Time Tracking:
Not Specified

File Attachments:
  Size
Zip Archive tapestry-607.zip 2005-11-06 05:44 PM Leonardo Quijano Vincenzi 3.53 MB
Environment: jdk1.4,tapestry4-beta5

Resolution Date: 23/Nov/05 11:36 PM


 Description  « Hide
After upgrading my project from beta4 to beta5,the utf-8 encoded Chinses,Japanese characters comes into "????".In fact ,the problem occurs in continuous actived pages.For example,In page1,in direct lisener mehtod invoke "cycle.active('page2')",then in page2,in pageBeginRender() invoke "cycle.active(page3)",now in rendered page3,all the "utf-8" character comes in to "????".


 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Leonardo Quijano Vincenzi added a comment - 27/Sep/05 11:22 AM
This is what I think it's happening. I'm using Apache Tomcat 5.5.12alpha, Tapestry 4.0-beta-8. I'm having problems using the UTF-8 and error pages. Then:

1) On the first page rendering, the content type for the PrintWriter is setup appropiately in ServletWebResponse.getPrintWriter():

    public PrintWriter getPrintWriter(ContentType contentType) throws IOException
    {
        Defense.notNull(contentType, "contentType");
        if (_needsReset)
            reset();
        _needsReset = true;
        _servletResponse.setContentType(contentType.toString());
        try
        {
            return _servletResponse.getWriter();
        }
        catch (IOException ex)
        {
            throw new ApplicationRuntimeException(WebMessages.writerOpenError(contentType, ex),
                    null, ex);
        }
    }

2) After we already have the writer, the application throws some exception and the rendering is forwarded to another page (same HttpServletResponse). Then we come again to page rendering (and to call getWriter() again).

3) The first time reset() is called, the content-type is reset. Then _servletResponse.setContentType() is called and the encoding is changed to UTF-8 as it should. The second time reset() is called, _servletResponse.setContentType() won't work (it won't change the encoding), because we already have a writer selected:

(from Tomcat's org.apache.catalina.connector.Response):

    /**
     * Set the content type for this Response.
     *
     * @param type The new content type
     */
    public void setContentType(String type) {

        if (isCommitted())
            return;

        // Ignore any call from an included servlet
        if (included)
            return;

        // Ignore charset if getWriter() has already been called
        if (usingWriter) {
            if (type != null) {
                int index = type.indexOf(";");
                if (index != -1) {
                    type = type.substring(0, index);
                }
            }
        }

It strips out the encoding from the content-type. That leaves us with a reset() response writer, and encoding ISO8859-1.
That causes the "????" when rendering.

I guess the fix would be to avoid resetting the response a second time in the same request??
Comments?

 

Mind Bridge added a comment - 05/Nov/05 06:03 AM
Intrestingly, I cannot reproduce this issue. I guess that may depend on the implementation of the servlet container. I have tried with Jetty and Tomcat 5.x, but both seem to work well. I thought I had reproduced the issue earlier, but turned out to be an entirely different "problem" altogher. If template-encoding, etc, is set correctly, then the exception page renders okay for me, with a DirectLink, server side redirects, etc.

What you say makes sense, but I cannot commit code without ensuring that it works first, which means that the problem has to be reproduced first.

I would ask that you provide the following information:
- the servlet container that was used (make + version)
- whether template-encoding has been specified, and what it is
- Some more information of the request structure in this particular case

Also, if you have a patch, that would be very useful. If it makes sense and will not cause other issues, I will commit it even if I cannot personally reproduce the problem.

Leonardo Quijano Vincenzi added a comment - 06/Nov/05 05:44 PM
WAR Test case for bug and the Index.html I got during testing

Leonardo Quijano Vincenzi added a comment - 06/Nov/05 05:48 PM
Ok I posted a WAR file you can deploy on Tomcat 5.5.12 and it should get you the problem:

http://localhost:8080/Tapestry607/Index.html

The tricky part is... I tested throwing an Exception on Index.html and it didn't work. *But*, then I though... maybe if I create a component... and voilà... for me at least, if I throw an exception from @SubComponent, it gets me the error page I included in the zip file (with the spanish accents as "?").

Hope it helps!

Mind Bridge added a comment - 08/Nov/05 08:21 AM
I have finally reproduced the issue and it does turn out to be servlet dependent -- it appears only in Tomcat. The other servlet containers I've tested work okay. What's more, this appears to be a very clear bug in the Tomcat code. Consider the following:
- reset() on the servlet response is necessary in many cases and should be supported well by the servlet container. In this case it eliminates any remnants of the page before an exception occurred.
- Tomcat does very well to check if the Writer was already obtained in setContentType(). If that is the case, the writer should not be changed and in particular its encoding should definitely not be changed. Unfortunately, Tomcat _does_ change the encoding of the writer to ISO-8859-1, because the encoding provided in the content type was "cleared" and no encoding is remembered as a result. This is a definite bug that is the result of the left hand not knowing what the right is doing. Tapestry suffers as a result.

The only possible solution is to circumvent that problem in Tomcat by only invoking setContentType() the first time around. This will be problematic in the general case, however, since the content type (not the encoding) of the error page may differ from that of the page (e.g. text/html vs. text/rtf). I am therefore wondering what the best way to proceed further is. Any suggestions would be welcome.

Henri Dupre added a comment - 08/Nov/05 03:16 PM
Mind Bridge, that's some terrific debugging! Have you opened a bug for Tomcat? It would be also worth to submit the problem in the tomcat mailing list. I'm sure the problem can be solved for the next version of Tomcat.

Leonardo Quijano Vincenzi added a comment - 08/Nov/05 03:35 PM
This seems like a similar issue in Tomcat's database. It's still unsolved:

http://issues.apache.org/bugzilla/show_bug.cgi?id=37072

The guy here says it only happens on error conditions. I'd think Tapestry gets into the same situation when it does the reset().

Now, if a quick work-around could be made for Tapestry, it'd be great, while they solve the issue (and considering the possible situation that they solve it for future releases only, that'd leave people using Tapestry in older versions of Tomcat with problems).

I'd think the case of content type differing between normal and error pages to be uncommon, don't you think? This could be made configurable, of course.

Howard M. Lewis Ship added a comment - 23/Nov/05 11:36 PM
The workaround is triggered by adding -Dorg.apache.tapestry.607-patch=true to the command line.

Curtis Paris added a comment - 21/Apr/06 08:12 AM
I would suggest that this bug be reopened, and that the work around be removed.

In debugging a UTF-8 issue on our new site, we came across this issue. Infact, I diagnosed it down to the same set of code in the Tomcat code base as the linked Tomcat bug. It wasn't until I had found the _tomcatPatch in ServletWebResponse, and the org.apache.tapestry.607-patch flag that I came across the ticket here.

When using the work around, we are seeing that the Content-Type is no longer sent at all. This is very bad, and I think worse than the fact that it doesn't include the charset in the Content-Type.

What we are doing:
Our Logout.html page is hooking the pageBeginRender event. It throws a PageRedirectException to the login page. The login page activates and renders.

Environment:
Tomcat 5.5.16, Tapestry 4.0.1, JSDK 1.5.0

Incase you want to see the headers, using a sniffer, here is what we saw. In normal, we get "text/html;charset=UTF-8". In the Non-Workaround, we get "text/html". and with the work around, we get no Content-Type at all.


---------- NORMAL REQUEST TO LOGIN ------------
GET /Login.html HTTP/1.1
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/x-shockwave-flash, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, */*
Accept-Language: en-securid
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; InfoPath.1)
Host: d-sfo-nix-cparis:8080
Connection: Keep-Alive
Cookie: org.apache.tapestry.locale=fr_FR;

HTTP/1.1 200 OK
Server: Apache-Coyote/1.1
Set-Cookie: JSESSIONID=7F8998B4BFE5144F186FD71FF93E8989; Path=/
Content-Type: text/html;charset=UTF-8
Content-Length: 6719
Date: Fri, 21 Apr 2006 00:00:45 GMT

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<!-- Application: PAC 2.0 -->
<!-- Page: Login -->


---------- REQUEST TO LOGIN VIA THE LOG OUT PAGE, NO WORKAROUND ------------
GET /Logout.external?sp=SReportsSummary HTTP/1.1
Accept: */*
Accept-Language: en-securid
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; InfoPath.1)
Host: d-sfo-nix-cparis:8080
Connection: Keep-Alive
Cache-Control: no-cache
Cookie: org.apache.tapestry.locale=fr_FR;

HTTP/1.1 200 OK
Server: Apache-Coyote/1.1
Content-Type: text/html
Content-Length: 6802
Date: Thu, 20 Apr 2006 23:55:53 GMT

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<!-- Application: PAC 2.0 -->
<!-- Page: Login -->

---------- REQUEST TO LOGIN VIA THE LOG OUT PAGE, WITH WORKAROUND ------------
GET /Logout.external?sp=SReportsSummary HTTP/1.1
Accept: */*
Accept-Language: en-securid
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; InfoPath.1)
Host: d-sfo-nix-cparis:8080
Connection: Keep-Alive
Cache-Control: no-cache
Cookie: org.apache.tapestry.locale=fr_FR;

HTTP/1.1 200 OK
Server: Apache-Coyote/1.1
Content-Length: 6802
Date: Thu, 20 Apr 2006 23:43:51 GMT

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<!-- Application: PAC 2.0 -->
<!-- Page: Login -->

Risto Reinpõld added a comment - 13/Feb/07 12:25 PM
I agree with Curtis Paris. This workaround avoids calling 'setContentType' multiple times, but this really does not help, because 'reset' is called multiple times. That is why no content-type header is sent at all with response.

One way to overcome this issue, is by calling 'resetBuffer' instead of 'reset'. This leaves the initial content-type intact. If the content-type of your error page is different, then there is probably no other way than fix tomcat.