Tapestry
  1. Tapestry
  2. TAPESTRY-607

Output encoding problem with some versions of Tomcat 5

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.0
    • Fix Version/s: 4.0
    • Component/s: Framework
    • Labels:
      None
    • Environment:
      jdk1.4,tapestry4-beta5

      Description

      After upgrading my project from beta4 to beta5,the utf-8 encoded Chinses,Japanese characters comes into "??".In fact ,the problem occurs in continuous actived pages.For example,In page1,in direct lisener mehtod invoke "cycle.active('page2')",then in page2,in pageBeginRender() invoke "cycle.active(page3)",now in rendered page3,all the "utf-8" character comes in to "??".

      1. tapestry-607.zip
        3.53 MB
        Leonardo Quijano Vincenzi

        Activity

        Jone created issue -
        Hide
        Leonardo Quijano Vincenzi added a comment -

        This is what I think it's happening. I'm using Apache Tomcat 5.5.12alpha, Tapestry 4.0-beta-8. I'm having problems using the UTF-8 and error pages. Then:

        1) On the first page rendering, the content type for the PrintWriter is setup appropiately in ServletWebResponse.getPrintWriter():

        public PrintWriter getPrintWriter(ContentType contentType) throws IOException
        {
        Defense.notNull(contentType, "contentType");
        if (_needsReset)
        reset();
        _needsReset = true;
        _servletResponse.setContentType(contentType.toString());
        try

        { return _servletResponse.getWriter(); }

        catch (IOException ex)

        { throw new ApplicationRuntimeException(WebMessages.writerOpenError(contentType, ex), null, ex); }

        }

        2) After we already have the writer, the application throws some exception and the rendering is forwarded to another page (same HttpServletResponse). Then we come again to page rendering (and to call getWriter() again).

        3) The first time reset() is called, the content-type is reset. Then _servletResponse.setContentType() is called and the encoding is changed to UTF-8 as it should. The second time reset() is called, _servletResponse.setContentType() won't work (it won't change the encoding), because we already have a writer selected:

        (from Tomcat's org.apache.catalina.connector.Response):

        /**

        • Set the content type for this Response.
          *
        • @param type The new content type
          */
          public void setContentType(String type) {

        if (isCommitted())
        return;

        // Ignore any call from an included servlet
        if (included)
        return;

        // Ignore charset if getWriter() has already been called
        if (usingWriter) {
        if (type != null) {
        int index = type.indexOf(";");
        if (index != -1)

        { type = type.substring(0, index); }

        }
        }

        It strips out the encoding from the content-type. That leaves us with a reset() response writer, and encoding ISO8859-1.
        That causes the "????" when rendering.

        I guess the fix would be to avoid resetting the response a second time in the same request??
        Comments?

        Show
        Leonardo Quijano Vincenzi added a comment - This is what I think it's happening. I'm using Apache Tomcat 5.5.12alpha, Tapestry 4.0-beta-8. I'm having problems using the UTF-8 and error pages. Then: 1) On the first page rendering, the content type for the PrintWriter is setup appropiately in ServletWebResponse.getPrintWriter(): public PrintWriter getPrintWriter(ContentType contentType) throws IOException { Defense.notNull(contentType, "contentType"); if (_needsReset) reset(); _needsReset = true; _servletResponse.setContentType(contentType.toString()); try { return _servletResponse.getWriter(); } catch (IOException ex) { throw new ApplicationRuntimeException(WebMessages.writerOpenError(contentType, ex), null, ex); } } 2) After we already have the writer, the application throws some exception and the rendering is forwarded to another page (same HttpServletResponse). Then we come again to page rendering (and to call getWriter() again). 3) The first time reset() is called, the content-type is reset. Then _servletResponse.setContentType() is called and the encoding is changed to UTF-8 as it should. The second time reset() is called, _servletResponse.setContentType() won't work (it won't change the encoding), because we already have a writer selected: (from Tomcat's org.apache.catalina.connector.Response): /** Set the content type for this Response. * @param type The new content type */ public void setContentType(String type) { if (isCommitted()) return; // Ignore any call from an included servlet if (included) return; // Ignore charset if getWriter() has already been called if (usingWriter) { if (type != null) { int index = type.indexOf(";"); if (index != -1) { type = type.substring(0, index); } } } It strips out the encoding from the content-type. That leaves us with a reset() response writer, and encoding ISO8859-1. That causes the "????" when rendering. I guess the fix would be to avoid resetting the response a second time in the same request?? Comments?
        Hide
        Mind Bridge added a comment -

        Intrestingly, I cannot reproduce this issue. I guess that may depend on the implementation of the servlet container. I have tried with Jetty and Tomcat 5.x, but both seem to work well. I thought I had reproduced the issue earlier, but turned out to be an entirely different "problem" altogher. If template-encoding, etc, is set correctly, then the exception page renders okay for me, with a DirectLink, server side redirects, etc.

        What you say makes sense, but I cannot commit code without ensuring that it works first, which means that the problem has to be reproduced first.

        I would ask that you provide the following information:

        • the servlet container that was used (make + version)
        • whether template-encoding has been specified, and what it is
        • Some more information of the request structure in this particular case

        Also, if you have a patch, that would be very useful. If it makes sense and will not cause other issues, I will commit it even if I cannot personally reproduce the problem.

        Show
        Mind Bridge added a comment - Intrestingly, I cannot reproduce this issue. I guess that may depend on the implementation of the servlet container. I have tried with Jetty and Tomcat 5.x, but both seem to work well. I thought I had reproduced the issue earlier, but turned out to be an entirely different "problem" altogher. If template-encoding, etc, is set correctly, then the exception page renders okay for me, with a DirectLink, server side redirects, etc. What you say makes sense, but I cannot commit code without ensuring that it works first, which means that the problem has to be reproduced first. I would ask that you provide the following information: the servlet container that was used (make + version) whether template-encoding has been specified, and what it is Some more information of the request structure in this particular case Also, if you have a patch, that would be very useful. If it makes sense and will not cause other issues, I will commit it even if I cannot personally reproduce the problem.
        Hide
        Leonardo Quijano Vincenzi added a comment -

        WAR Test case for bug and the Index.html I got during testing

        Show
        Leonardo Quijano Vincenzi added a comment - WAR Test case for bug and the Index.html I got during testing
        Leonardo Quijano Vincenzi made changes -
        Field Original Value New Value
        Attachment tapestry-607.zip [ 12320495 ]
        Hide
        Leonardo Quijano Vincenzi added a comment -

        Ok I posted a WAR file you can deploy on Tomcat 5.5.12 and it should get you the problem:

        http://localhost:8080/Tapestry607/Index.html

        The tricky part is... I tested throwing an Exception on Index.html and it didn't work. But, then I though... maybe if I create a component... and voilà... for me at least, if I throw an exception from @SubComponent, it gets me the error page I included in the zip file (with the spanish accents as "?").

        Hope it helps!

        Show
        Leonardo Quijano Vincenzi added a comment - Ok I posted a WAR file you can deploy on Tomcat 5.5.12 and it should get you the problem: http://localhost:8080/Tapestry607/Index.html The tricky part is... I tested throwing an Exception on Index.html and it didn't work. But , then I though... maybe if I create a component... and voilà... for me at least, if I throw an exception from @SubComponent, it gets me the error page I included in the zip file (with the spanish accents as "?"). Hope it helps!
        Hide
        Mind Bridge added a comment -

        I have finally reproduced the issue and it does turn out to be servlet dependent – it appears only in Tomcat. The other servlet containers I've tested work okay. What's more, this appears to be a very clear bug in the Tomcat code. Consider the following:

        • reset() on the servlet response is necessary in many cases and should be supported well by the servlet container. In this case it eliminates any remnants of the page before an exception occurred.
        • Tomcat does very well to check if the Writer was already obtained in setContentType(). If that is the case, the writer should not be changed and in particular its encoding should definitely not be changed. Unfortunately, Tomcat does change the encoding of the writer to ISO-8859-1, because the encoding provided in the content type was "cleared" and no encoding is remembered as a result. This is a definite bug that is the result of the left hand not knowing what the right is doing. Tapestry suffers as a result.

        The only possible solution is to circumvent that problem in Tomcat by only invoking setContentType() the first time around. This will be problematic in the general case, however, since the content type (not the encoding) of the error page may differ from that of the page (e.g. text/html vs. text/rtf). I am therefore wondering what the best way to proceed further is. Any suggestions would be welcome.

        Show
        Mind Bridge added a comment - I have finally reproduced the issue and it does turn out to be servlet dependent – it appears only in Tomcat. The other servlet containers I've tested work okay. What's more, this appears to be a very clear bug in the Tomcat code. Consider the following: reset() on the servlet response is necessary in many cases and should be supported well by the servlet container. In this case it eliminates any remnants of the page before an exception occurred. Tomcat does very well to check if the Writer was already obtained in setContentType(). If that is the case, the writer should not be changed and in particular its encoding should definitely not be changed. Unfortunately, Tomcat does change the encoding of the writer to ISO-8859-1, because the encoding provided in the content type was "cleared" and no encoding is remembered as a result. This is a definite bug that is the result of the left hand not knowing what the right is doing. Tapestry suffers as a result. The only possible solution is to circumvent that problem in Tomcat by only invoking setContentType() the first time around. This will be problematic in the general case, however, since the content type (not the encoding) of the error page may differ from that of the page (e.g. text/html vs. text/rtf). I am therefore wondering what the best way to proceed further is. Any suggestions would be welcome.
        Hide
        Henri Dupre added a comment -

        Mind Bridge, that's some terrific debugging! Have you opened a bug for Tomcat? It would be also worth to submit the problem in the tomcat mailing list. I'm sure the problem can be solved for the next version of Tomcat.

        Show
        Henri Dupre added a comment - Mind Bridge, that's some terrific debugging! Have you opened a bug for Tomcat? It would be also worth to submit the problem in the tomcat mailing list. I'm sure the problem can be solved for the next version of Tomcat.
        Hide
        Leonardo Quijano Vincenzi added a comment -

        This seems like a similar issue in Tomcat's database. It's still unsolved:

        http://issues.apache.org/bugzilla/show_bug.cgi?id=37072

        The guy here says it only happens on error conditions. I'd think Tapestry gets into the same situation when it does the reset().

        Now, if a quick work-around could be made for Tapestry, it'd be great, while they solve the issue (and considering the possible situation that they solve it for future releases only, that'd leave people using Tapestry in older versions of Tomcat with problems).

        I'd think the case of content type differing between normal and error pages to be uncommon, don't you think? This could be made configurable, of course.

        Show
        Leonardo Quijano Vincenzi added a comment - This seems like a similar issue in Tomcat's database. It's still unsolved: http://issues.apache.org/bugzilla/show_bug.cgi?id=37072 The guy here says it only happens on error conditions. I'd think Tapestry gets into the same situation when it does the reset(). Now, if a quick work-around could be made for Tapestry, it'd be great, while they solve the issue (and considering the possible situation that they solve it for future releases only, that'd leave people using Tapestry in older versions of Tomcat with problems). I'd think the case of content type differing between normal and error pages to be uncommon, don't you think? This could be made configurable, of course.
        Howard M. Lewis Ship made changes -
        Assignee Howard M. Lewis Ship [ hlship ]
        Howard M. Lewis Ship made changes -
        Summary "UTF-8" page coming to "?????" After continuous cycle.actived page Output encoding problem with some versions of Tomcat 5
        Hide
        Howard M. Lewis Ship added a comment -

        The workaround is triggered by adding -Dorg.apache.tapestry.607-patch=true to the command line.

        Show
        Howard M. Lewis Ship added a comment - The workaround is triggered by adding -Dorg.apache.tapestry.607-patch=true to the command line.
        Howard M. Lewis Ship made changes -
        Resolution Fixed [ 1 ]
        Status Open [ 1 ] Closed [ 6 ]
        Fix Version/s 4.0 [ 10794 ]
        Hide
        Curtis Paris added a comment -

        I would suggest that this bug be reopened, and that the work around be removed.

        In debugging a UTF-8 issue on our new site, we came across this issue. Infact, I diagnosed it down to the same set of code in the Tomcat code base as the linked Tomcat bug. It wasn't until I had found the _tomcatPatch in ServletWebResponse, and the org.apache.tapestry.607-patch flag that I came across the ticket here.

        When using the work around, we are seeing that the Content-Type is no longer sent at all. This is very bad, and I think worse than the fact that it doesn't include the charset in the Content-Type.

        What we are doing:
        Our Logout.html page is hooking the pageBeginRender event. It throws a PageRedirectException to the login page. The login page activates and renders.

        Environment:
        Tomcat 5.5.16, Tapestry 4.0.1, JSDK 1.5.0

        Incase you want to see the headers, using a sniffer, here is what we saw. In normal, we get "text/html;charset=UTF-8". In the Non-Workaround, we get "text/html". and with the work around, we get no Content-Type at all.

        ---------- NORMAL REQUEST TO LOGIN ------------
        GET /Login.html HTTP/1.1
        Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/x-shockwave-flash, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, /
        Accept-Language: en-securid
        Accept-Encoding: gzip, deflate
        User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; InfoPath.1)
        Host: d-sfo-nix-cparis:8080
        Connection: Keep-Alive
        Cookie: org.apache.tapestry.locale=fr_FR;

        HTTP/1.1 200 OK
        Server: Apache-Coyote/1.1
        Set-Cookie: JSESSIONID=7F8998B4BFE5144F186FD71FF93E8989; Path=/
        Content-Type: text/html;charset=UTF-8
        Content-Length: 6719
        Date: Fri, 21 Apr 2006 00:00:45 GMT

        <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
        <!-- Application: PAC 2.0 -->
        <!-- Page: Login -->

        ---------- REQUEST TO LOGIN VIA THE LOG OUT PAGE, NO WORKAROUND ------------
        GET /Logout.external?sp=SReportsSummary HTTP/1.1
        Accept: /
        Accept-Language: en-securid
        Accept-Encoding: gzip, deflate
        User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; InfoPath.1)
        Host: d-sfo-nix-cparis:8080
        Connection: Keep-Alive
        Cache-Control: no-cache
        Cookie: org.apache.tapestry.locale=fr_FR;

        HTTP/1.1 200 OK
        Server: Apache-Coyote/1.1
        Content-Type: text/html
        Content-Length: 6802
        Date: Thu, 20 Apr 2006 23:55:53 GMT

        <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
        <!-- Application: PAC 2.0 -->
        <!-- Page: Login -->

        ---------- REQUEST TO LOGIN VIA THE LOG OUT PAGE, WITH WORKAROUND ------------
        GET /Logout.external?sp=SReportsSummary HTTP/1.1
        Accept: /
        Accept-Language: en-securid
        Accept-Encoding: gzip, deflate
        User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; InfoPath.1)
        Host: d-sfo-nix-cparis:8080
        Connection: Keep-Alive
        Cache-Control: no-cache
        Cookie: org.apache.tapestry.locale=fr_FR;

        HTTP/1.1 200 OK
        Server: Apache-Coyote/1.1
        Content-Length: 6802
        Date: Thu, 20 Apr 2006 23:43:51 GMT

        <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
        <!-- Application: PAC 2.0 -->
        <!-- Page: Login -->

        Show
        Curtis Paris added a comment - I would suggest that this bug be reopened, and that the work around be removed. In debugging a UTF-8 issue on our new site, we came across this issue. Infact, I diagnosed it down to the same set of code in the Tomcat code base as the linked Tomcat bug. It wasn't until I had found the _tomcatPatch in ServletWebResponse, and the org.apache.tapestry.607-patch flag that I came across the ticket here. When using the work around, we are seeing that the Content-Type is no longer sent at all. This is very bad, and I think worse than the fact that it doesn't include the charset in the Content-Type. What we are doing: Our Logout.html page is hooking the pageBeginRender event. It throws a PageRedirectException to the login page. The login page activates and renders. Environment: Tomcat 5.5.16, Tapestry 4.0.1, JSDK 1.5.0 Incase you want to see the headers, using a sniffer, here is what we saw. In normal, we get "text/html;charset=UTF-8". In the Non-Workaround, we get "text/html". and with the work around, we get no Content-Type at all. ---------- NORMAL REQUEST TO LOGIN ------------ GET /Login.html HTTP/1.1 Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/x-shockwave-flash, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, / Accept-Language: en-securid Accept-Encoding: gzip, deflate User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; InfoPath.1) Host: d-sfo-nix-cparis:8080 Connection: Keep-Alive Cookie: org.apache.tapestry.locale=fr_FR; HTTP/1.1 200 OK Server: Apache-Coyote/1.1 Set-Cookie: JSESSIONID=7F8998B4BFE5144F186FD71FF93E8989; Path=/ Content-Type: text/html;charset=UTF-8 Content-Length: 6719 Date: Fri, 21 Apr 2006 00:00:45 GMT <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <!-- Application: PAC 2.0 --> <!-- Page: Login --> ---------- REQUEST TO LOGIN VIA THE LOG OUT PAGE, NO WORKAROUND ------------ GET /Logout.external?sp=SReportsSummary HTTP/1.1 Accept: / Accept-Language: en-securid Accept-Encoding: gzip, deflate User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; InfoPath.1) Host: d-sfo-nix-cparis:8080 Connection: Keep-Alive Cache-Control: no-cache Cookie: org.apache.tapestry.locale=fr_FR; HTTP/1.1 200 OK Server: Apache-Coyote/1.1 Content-Type: text/html Content-Length: 6802 Date: Thu, 20 Apr 2006 23:55:53 GMT <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <!-- Application: PAC 2.0 --> <!-- Page: Login --> ---------- REQUEST TO LOGIN VIA THE LOG OUT PAGE, WITH WORKAROUND ------------ GET /Logout.external?sp=SReportsSummary HTTP/1.1 Accept: / Accept-Language: en-securid Accept-Encoding: gzip, deflate User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; InfoPath.1) Host: d-sfo-nix-cparis:8080 Connection: Keep-Alive Cache-Control: no-cache Cookie: org.apache.tapestry.locale=fr_FR; HTTP/1.1 200 OK Server: Apache-Coyote/1.1 Content-Length: 6802 Date: Thu, 20 Apr 2006 23:43:51 GMT <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <!-- Application: PAC 2.0 --> <!-- Page: Login -->
        Hide
        Risto Reinpõld added a comment -

        I agree with Curtis Paris. This workaround avoids calling 'setContentType' multiple times, but this really does not help, because 'reset' is called multiple times. That is why no content-type header is sent at all with response.

        One way to overcome this issue, is by calling 'resetBuffer' instead of 'reset'. This leaves the initial content-type intact. If the content-type of your error page is different, then there is probably no other way than fix tomcat.

        Show
        Risto Reinpõld added a comment - I agree with Curtis Paris. This workaround avoids calling 'setContentType' multiple times, but this really does not help, because 'reset' is called multiple times. That is why no content-type header is sent at all with response. One way to overcome this issue, is by calling 'resetBuffer' instead of 'reset'. This leaves the initial content-type intact. If the content-type of your error page is different, then there is probably no other way than fix tomcat.
        Mark Thomas made changes -
        Workflow jira [ 12323947 ] Default workflow, editable Closed status [ 12567017 ]
        Mark Thomas made changes -
        Workflow Default workflow, editable Closed status [ 12567017 ] jira [ 12589768 ]
        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Closed Closed
        86d 7h 31m 1 Howard M. Lewis Ship 23/Nov/05 23:36

          People

          • Assignee:
            Howard M. Lewis Ship
            Reporter:
            Jone
          • Votes:
            2 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development