Bug 27517 - The pageEncoding attribute is not used, when charset value is set.
Summary: The pageEncoding attribute is not used, when charset value is set.
Status: RESOLVED FIXED
Alias: None
Product: Tomcat 5
Classification: Unclassified
Component: Catalina (show other bugs)
Version: 5.0.19
Hardware: PC Linux
: P3 minor (vote)
Target Milestone: ---
Assignee: Tomcat Developers Mailing List
URL:
Keywords:
: 29342 (view as bug list)
Depends on:
Blocks:
 
Reported: 2004-03-08 15:21 UTC by Petr Pisl
Modified: 2004-11-16 19:05 UTC (History)
1 user (show)



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Petr Pisl 2004-03-08 15:21:31 UTC
When you have jsp page saved in encoding UTF-8:

<%@page contentType="text/html;charset=ISO-8859-2"%>
<%@page pageEncoding="UTF-8"%>
<html>
<head><title>JSP Page</title></head>
<body>
ěššěřčžřžřýýáýžýí
<br><hr>
</body>
</html>

then the jsp file is read with ISO-8859-2 encoding. There are wrong Czech
characters in the generated servlet.  But according to the JSP 2.0 specification
the file should be read with encoding from pageEncodintg (UTF-8 in my case) and
the responds has to be in charset value encoding ( ISO-8859-2).

We used the jasper-compiler.jar as jsp parser in NetBeans and we use the result
of the parsing for saving and loading jsp files into the editor, where the wrong
value of encoding causes problems.
Comment 1 Remy Maucherat 2004-03-08 15:24:58 UTC
I don't agree with that interpretation.
Comment 2 Petr Pisl 2004-03-08 15:33:11 UTC
What do you exectly think?
Comment 3 Petr Pisl 2004-03-08 15:42:25 UTC
There is part of the specification:

JSP.4.1
...
For JSP pages in standard syntax, the page character encoding is determined from
the following sources:
-A JSP configuration element page-encoding value whose URL pattern matches the page.
-The pageEncoding attribute of the page directive of the page. It is a
translation- time error to name different encodings in the pageEncoding
attribute of the page directive of a JSP page and in a JSP configuration element
whose URL pattern matches the page.
- The charset value of the contentType attribute of the page directive. This is
used to determine the page character encoding if neither a JSP configuration
element page-encoding nor the pageEncoding attribute are provided.
- If none of the above is provided, ISO-8859-1 is used as the default character
encoding.




Appendix JSP.D
Page Encoding Detection
....

3. If the file is a JSP page in standard syntax, use these steps. 

a. Check whether there is a JSP configuration element <page-encoding> whose URL
pattern matches this file. 

b. Read the file using the initial encoding and search for a pageEncoding
attribute in a page declaration. The specification requires the attribute to be
found only if it is not preceded by non-ASCII characters, so simplified
implementations are allowed. 

c. Report an error if there are a <page-encoding> configuration element whose
URL pattern matches this file and a pageEncoding attribute, and the two name
different encodings. 

d. If there is a <page-encoding> configuration element whose URL pattern matches
this file, the page character encoding is the one named in this element. 

e. Otherwise, if there is a pageEncoding attribute, the page character encoding
is the one named in this attribute. 

f. Otherwise, read the file using the initial encoding and search for a charset
value within a contentType attribute in a page declaration. If it exists, the
page character encoding is the one named in this charset value. The
specification requires the attribute to be found only if it is not preceded by
non-ASCII characters, so simplified implementations are allowed. 

g. Otherwise, the page character encoding is ISO-8859-1.
Comment 4 Remy Maucherat 2004-03-08 16:05:02 UTC
I think that:
- you have way too much time on your hands
- charset and pageEncoding define the same thing, so this is at best an issue of
priority
Comment 5 Petr Pisl 2004-03-08 17:19:40 UTC
I don't think so, that pageEncoding and charset define the same thing.

According to chapter "JSP.4.1 Page Character Encoding" the pageEncoding is for
encoding the page file itself. And according to chapter "JSP.4.2 Response
Character Encoding" the charset encoding is used for responds. When both
encoding are defined, then they are used for different things.
Comment 6 Remy Maucherat 2004-03-08 17:39:05 UTC
Yes, of course. We have a disagreement then (sec 1.10.1). If nobody fixes this
horrible "bug" for a while, then I will resolve it appropriately.
Comment 7 Jan Luehe 2004-03-08 17:55:29 UTC
Hi Petr,

you're right, this is a bug. I'll commit a patch shortly.

For the time being, as a workaround, you may want to collapse the 2 page
directives into a single page directive, like this:

<%@page contentType="text/html;charset=ISO-8859-2" pageEncoding="UTF-8"%>

Jan
Comment 8 Petr Pisl 2004-03-08 18:27:10 UTC
Hi Jan,

yes, it's workaround, but people can have some problems during saving or opening
jsp files in the jsp editor in NetBeans IDE. We use jasper-compiler.jar for
obtaining information about a jsp and the encoding is one of them. I'm
appreciate the  promiss of  fixing it.

thanks

Petr 
Comment 9 Petr Pisl 2004-03-09 10:26:50 UTC
Thanks for the quick fix. 
Comment 10 Jan Luehe 2004-06-03 16:43:14 UTC
*** Bug 29342 has been marked as a duplicate of this bug. ***