Bug 31201 - Encoding bug when using <jsp:include ...> action
Summary: Encoding bug when using <jsp:include ...> action
Status: RESOLVED FIXED
Alias: None
Product: Tomcat 4
Classification: Unclassified
Component: Catalina (show other bugs)
Version: 4.1.30
Hardware: All All
: P3 major with 39 votes (vote)
Target Milestone: ---
Assignee: Tomcat Developers Mailing List
URL:
Keywords:
Depends on:
Blocks: 41773
  Show dependency tree
 
Reported: 2004-09-13 16:21 UTC by Takayuki Kaneko
Modified: 2006-09-07 16:25 UTC (History)
1 user (show)



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Takayuki Kaneko 2004-09-13 16:21:48 UTC
There is a bug in "JSP include" that DefaultServlet assigns default locale 
encoding 
for included file using <jsp:include file="..."> action, 
and the included file's encoding is not matched default locale.

For example;

locale=ja_JP.SJIS

abc.html:
written by Windows-31J

page.jsp:
---
<%@ page contentType="text/html; charset=Windows-31J" %>
page include!!
<HR>
<jsp:include file="abc.html" flush="true" /> 
<HR>
---

HTML file's encoding should be fixed when Web Application is deploying.
So I tried to make a patch for this bug.
I added "fileEncoding" parameter for DefaultServlet.

Patch for this bug follows:
===================================================================
*** DefaultServlet.java	Sun Jan 25 22:23:42 2004
--- DefaultServlet_patch.java	Mon Sep 13 22:13:39 2004
***************
*** 177,182 ****
--- 177,187 ----
  
  
      /**
+      * Encoding for Reader
+      */
+     protected String fileEncoding;
+     
+     /**
       * MD5 message digest provider.
       */
      protected static MessageDigest md5Helper;
***************
*** 306,311 ****
--- 311,322 ----
          } catch (Throwable t) {
              ;
          }
+         try {
+             value = getServletConfig().getInitParameter("fileEncoding");
+             fileEncoding = value;
+         } catch (Throwable t) {
+             ;
+         }
  
          // Sanity check on the specified buffer sizes
          if (input < 256)
***************
*** 1793,1799 ****
  
          InputStream resourceInputStream = resourceInfo.getStream();
          // FIXME : i18n ?
!         Reader reader = new InputStreamReader(resourceInputStream);
  
          // Copy the input stream to the output stream
          exception = copyRange(reader, writer);
--- 1804,1812 ----
  
          InputStream resourceInputStream = resourceInfo.getStream();
          // FIXME : i18n ?
!         Reader reader = (fileEncoding == null) ?
!         		new InputStreamReader(resourceInputStream) :
!        			new InputStreamReader(resourceInputStream, 
fileEncoding);
  
          // Copy the input stream to the output stream
          exception = copyRange(reader, writer);
***************
*** 1864,1870 ****
          IOException exception = null;
  
          InputStream resourceInputStream = resourceInfo.getStream();
!         Reader reader = new InputStreamReader(resourceInputStream);
          exception = copyRange(reader, writer, range.start, range.end);
  
          // Clean up the input stream
--- 1877,1885 ----
          IOException exception = null;
  
          InputStream resourceInputStream = resourceInfo.getStream();
!         Reader reader = (fileEncoding == null) ?
!         		new InputStreamReader(resourceInputStream) :
!        			new InputStreamReader(resourceInputStream, 
fileEncoding);
          exception = copyRange(reader, writer, range.start, range.end);
  
          // Clean up the input stream
***************
*** 1956,1962 ****
          while ( (exception == null) && (ranges.hasMoreElements()) ) {
  
              InputStream resourceInputStream = resourceInfo.getStream();
!             Reader reader = new InputStreamReader(resourceInputStream);
  
              Range currentRange = (Range) ranges.nextElement();
  
--- 1971,1979 ----
          while ( (exception == null) && (ranges.hasMoreElements()) ) {
  
              InputStream resourceInputStream = resourceInfo.getStream();
!             Reader reader = (fileEncoding == null) ?
!             		new InputStreamReader(resourceInputStream) :
!            			new InputStreamReader(resourceInputStream, 
fileEncoding);
  
              Range currentRange = (Range) ranges.nextElement();
Comment 1 toraneko 2004-09-26 01:56:42 UTC
This is a problem at JSP spec(5.4 sec) not Tomcat. JSP spec doesn't show 
included pages encoding. As far as I read cuurent JSP spec, I think default 
encoding should be defined by pageEncoding attribute in JSP's page directive 
of included page. 

I proposes adding pageEncoding attribute for jsp:include directive (JSP spec 
5.4) to specify included page default encoding. If included page has 
pageEncoding, this jsp:include's encoding should be ignored.

However this patch seems valuable since JSP spec will be fixed. I think this 
patch should be applied for current tomcat.

reagrds,

Takashi Okamoto
Comment 2 Mark Thomas 2004-10-02 14:33:24 UTC
The default encoding for included pages is defined in 
http://www.jcp.org/aboutJava/communityprocess/maintenance/jsr053/errata_1_2_a_2
0020321.html
Comments and suggestions for the JSP spec should be directed to the spec team 
(jsp-spec-comments@eng.sun.com) rather than this mailing list.
Comment 3 Mark Thomas 2004-10-02 14:47:31 UTC
The patch suggested above would be a Tomcat specific solution to a wider i18n 
problem. Whilst it would work for Tomcat I would not wish to encourage an 
approach such as this as if you ever need to move to an alternative container, 
it will break again. Therefore, this patch will not be applied. (Hence the 
WONTFIX).

There are, however, a number of alternative approaches that are spec compliant 
and are hence container neutral.

The simplest approach is to convert the HTML to JSPs and specify the correct 
page encoding in the new JSPs using <%@ page pageEncoding=... %>.

There are other approaches and if the one suggested above isn't appropriate I 
suggest that you post a question on the tomcat-user mailing list rather than 
this bug report.
Comment 4 toraneko 2004-10-05 06:27:43 UTC
> There are, however, a number of alternative approaches that are spec  > compliant and are hence container neutral.  I knew. However these approaches could not include HTML direct. We should convert HTML to JSP or use escaped character for non iso-8859-1 characters. Some of the people want include HTML direct. Then, this becomes problem.   You said I feed back JCP. It's really right, but imagin, how long we wait to resolve the problem and does tomcat never have tomcat specific functions? Could you see the number of vote for this bug?  I'm disappointment for your i18n and l10n attitude. I posted i18n and l10n patch in the past but it took _6 month_ that tomcat commiter took my patch.   I recommend you to provide temporary solution since release Tomcat 6.0 again.  bye,  Takashi      
Comment 5 Takayuki Kaneko 2004-10-05 14:37:22 UTC
Although I think that correcting specification is the right approach,
there is no solution in the present JSP specification.
Therefore the problem should be coped with by a container.

When not applying this patch, Tomcat depends on default encoding of Java VM
to include of a HTML file.

Since Web application depends by environment strongly,
it will be the worst situation if you consider the portability which you say. 

If there are already many properties which are valuable and depend on
the container, I suggest this patch should be treated as one of them.
Comment 6 Mark Thomas 2004-10-07 21:33:59 UTC
I am still not happy with the portability implications of this. I still think 
it would be better for the 'fix' to be part of the application (and hence be 
portable) rather than part of the container (not portable and may require 
changes to app between containers depending on how each handles this).

That being said, if all other options are inappropriate then I am prepared to 
consider this patch. Therefore I'd would appreciate it you would consider the 
following options.

1. I understand that some developers want to include HTML directly. However, 
given that this has i18n problems and i18n is obviously important to you why 
not tell your developers that they can't do this and should convert to JSPs? 
It is a change to the file name, adding a encoding directive and changing 
references from xxx.html to xxx.jsp. This would be trivial to automate.

2. How about using -Dfile.encoding="..." at the JVM level?

3. The patch (and option 2 above) assumes that every file has the same 
encoding. Is this always the case? Might different files in different apps 
have different encodings? If so, you could specify a modified version of the 
default servlet in your web application to handle .html files. With this 
option you could even handle different static file encodings within the same 
web app. This option would give you a lot of flexibility (which may or may not 
be useful to you).
Comment 7 Takayuki Kaneko 2004-10-09 09:29:01 UTC
1. Converting HTML files into JSPs has following problems.

- Since the JSP compile is needed, a performance may become a problem.

- HTML files must be included correctly in JSP spec.
  I think requiring converting HTML into JSP means not fulfilling
  specification.

2."file.encoding" option does not operate depending on the version
  of JVM.

  1.1 OK
  1.2 NG
  1.3 - 1.4.1 OK
  1.4.2 NG

  "file.encoding" is determined by default locale at the time of JVM execution.

  I recommend you to see the following page.
  http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4163515

3. The best solution for including HTML files of multiple encodings is
   changing the JSP spec.

   First of all, a browser cannot display if multiple encodings are contained
   in 1 page. In this case, in a HTML file, you should use comprehensive
   encodings, such as UTF8. 
   However, do you want to use UTF8 as default locale? This patch can separate
   encodings of default locale and web app.

This patch extends options without any undesireble effects.
Other options are still effective even if this patch is accepted.

reagrds,

Takayuki Kaneko
Comment 8 Mark Thomas 2004-10-09 11:19:13 UTC
1. If performance is an issue, pre-compile your JSPs. If you can quote the 
part of the spec tomcat isn't following and the patch addresses this issue 
I'll commit it. However, having read the spec, and the errata, I believe 
tomcat is spec compliant.

2. I wasn't aware of this. Thanks for pointing this out. This is clearly not 
an option.

3. I think you mis-understood me. I was not trying to address one file with 
multiple encodings. My point is that your patch assumes (as does the current 
tomcat code) that ALL static files for all webapps will have the same 
encoding. Is this sufficient to address your i18n issues? Is it not possible 
to have one webapp with html files using one encoding and one webapp with thml 
files using a different one?
Comment 9 Takayuki Kaneko 2004-10-09 16:43:11 UTC
1."pre-compile" is a good solution, but the sequence of replacing is
  complicate without stopping the web app while Tomcat is running.

  In a HTML file, I just need to replace it. It is much simple.

  I think your saying is not compliant the spec.
  If converting HTML into JSP is required, then HTML files can not be included.

3.My point is that it will be not necessary to consider the different static
  file encodings within the same web app. 

  As having stated above, the problem of displaying still remains,
  there is no solution for a such web app in the present JSP specification. 
  And there is no solution for even one static file encoding under present
  Tomcat.

Takayuki Kaneko
Comment 10 Mark Thomas 2004-10-10 21:02:09 UTC
1a. If you are worried about performance - precompile. If you want ease of 
replacement - don't. You can't have it both ways.

1b. Please state which section of the spec you believe tomcat is not compliant 
with. Without this, I can't invetsigate further.

3. What about different encodings across different webapps on the same server?

There are solutions for this issue with the current versions of tomact. My 
concern about the proposed patch, that it encourages non-portable coding, 
remains.

I am leaving this as REOPENED for now but I am leaning towards resolving it as 
WONTFIX.
Comment 11 Takayuki Kaneko 2004-10-11 02:46:21 UTC
1a.No I can't. So, I need this patch is committed.

1b.Please see API spec of RequestDispatcher#include method.
   (Because "jsp:include action" will be converted into this method.)

   This method must be able to handle any resource(such as a servlet,
   HTML file, or JSP file). But I feel like you said that because Tomcat
   cannot handle HTML file correctly, you ask to convert HTML file into
   JSP file. Isn't it right?

3.It can be resolved by defining DefaultServlet on each web app.


If this patch is committed, we can separate web app from environment(locale)
before fixing JSP spec.

If you don't set file.encoding property, Tomcat will handle it just like 
former. So I believe there will be no demerit at all.

If you say that property of i18n depending on container is a problem, see 
javaEncoding property of JspServlet. This property is also depend on the
container, and assume that ALL JSP files for all webapps have the same 
encoding. Do you think that javaEncoding property is also a problem?

Do you mean that it is not a problem, current Tomcat changes its handling by
environments?

I believe that this patch will be help for i18n problems, as other
properties.

Regards,

Takayuki Kaneko
Comment 12 Mark Thomas 2004-12-18 13:09:17 UTC
I have thought about this some more and decided that on balance the patch 
should be committed.

The patch has been applied to TC4.1.x and ported to TC5.5.x
Comment 13 Mark Thomas 2006-09-07 23:25:29 UTC
*** Bug 40440 has been marked as a duplicate of this bug. ***