Bug 50720 - When using jsp mapped as servlet in web.xml, cyrillic characters are not allowed in web.xml
Summary: When using jsp mapped as servlet in web.xml, cyrillic characters are not allo...
Status: RESOLVED FIXED
Alias: None
Product: Tomcat 7
Classification: Unclassified
Component: Catalina (show other bugs)
Version: 7.0.6
Hardware: PC All
: P2 normal (vote)
Target Milestone: ---
Assignee: Tomcat Developers Mailing List
URL:
Keywords:
: 50877 (view as bug list)
Depends on:
Blocks:
 
Reported: 2011-02-04 09:07 UTC by Ruslan
Modified: 2011-03-06 01:18 UTC (History)
1 user (show)



Attachments
Simple test web application (897 bytes, application/octet-stream)
2011-02-04 09:07 UTC, Ruslan
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Ruslan 2011-02-04 09:07:57 UTC
Created attachment 26605 [details]
Simple test web application

I am using web.xml in it simplest, incomlete form (note that making it
100% Servlet API 3.0 compliant does not help)

<?xml version="1.0" encoding="Windows-1251"?>
<web-app>
<!-- below are word testing Testoviy in cyrillic, try to use another symbols -->
<display-name>Тестовый web.xml</display-name>
<servlet>
<servlet-name>TestJSPMount</servlet-name>
<jsp-file>/test.jsp</jsp-file>
</servlet>
<servlet-mapping>
<servlet-name>TestJSPMount</servlet-name>
<url-pattern>/test.html</url-pattern>
</servlet-mapping>
</web-app>

During startup, tomcat throws exception:
04/02/2011 16:07:39 S - - StandardContext.loadOnStartup: Servlet
/testcyrwebxml threw load() exception
com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException:
Invalid byte 2 of 2-byte UTF-8 sequence.
       at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.invalidByte(Unknown
Source)
       at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.read(Unknown Source)
       at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load(Unknown
Source)
       at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.peekChar(Unknown
Source)
       at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(Unknown
Source)
       at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown
Source)
       at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(Unknown
Source)
       at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
Source)
       at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown
Source)
       at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown
Source)
       at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)
       at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown Source)
       at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unknown
Source)
       at org.apache.jasper.xmlparser.ParserUtils.parseXMLDocument(ParserUtils.java:96)
       at org.apache.jasper.compiler.JspConfig.processWebDotXml(JspConfig.java:83)
       at org.apache.jasper.compiler.JspConfig.init(JspConfig.java:231)
       at org.apache.jasper.compiler.JspConfig.findJspProperty(JspConfig.java:290)
       at org.apache.jasper.compiler.Compiler.generateJava(Compiler.java:113)
       at org.apache.jasper.compiler.Compiler.compile(Compiler.java:365)
       at org.apache.jasper.compiler.Compiler.compile(Compiler.java:345)
       at org.apache.jasper.compiler.Compiler.compile(Compiler.java:332)
       at org.apache.jasper.JspCompilationContext.compile(JspCompilationContext.java:594)
       at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:342)
       at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:391)
       at org.apache.jasper.servlet.JspServlet.init(JspServlet.java:128)
       at org.apache.catalina.core.StandardWrapper.initServlet(StandardWrapper.java:1133)
       at org.apache.catalina.core.StandardWrapper.loadServlet(StandardWrapper.java:1087)
       at org.apache.catalina.core.StandardWrapper.load(StandardWrapper.java:996)
       at org.apache.catalina.core.StandardContext.loadOnStartup(StandardContext.java:4741)
       at org.apache.catalina.core.StandardContext$3.call(StandardContext.java:5062)
       at org.apache.catalina.core.StandardContext$3.call(StandardContext.java:5057)
       at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
       at java.util.concurrent.FutureTask.run(Unknown Source)
       at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
       at java.lang.Thread.run(Unknown Source)

Platform in use:

Tomcat 7.0.6 binary windows release
JDK 1.6.0_18 x86
Windows 7 x64

If I change encoding of web.xml to UTF-8 it does not help also.
The only fix for this problem is to use only ISO-8859-1 characters.

I believe it is somehow related to some early initialization sequence,
when jsp is mapped as servlet.
Comment 1 Konstantin Kolinko 2011-02-04 09:12:28 UTC
This is caused by the following line in o.a.j.compiler.WebXml.java:

is = new ByteArrayInputStream(webXml.getBytes());

The above uses system encoding to convert MERGED_WEB_XML from String to byte[].

Actually InputStream is not needed there. One should use "new InputSource(Reader)".
Comment 2 Mark Thomas 2011-02-10 14:00:54 UTC
I couldn't repeat this even with either your test case or one of my own. However, I can see the problem the Konstantin has pointed out and I have fixed this in 7.0.x and it will be included in 7.0.9 onwards. If you are able to build 7.0.x from source and confirm that the issue s fixed that would be helpful.
Comment 3 Ruslan 2011-02-15 06:38:34 UTC
> If you are able to build 7.0.x from source and confirm that the issue is 
> fixed that would be  helpful.

Just checked on trunk, revision 1070840. The above test works fine.
Comment 4 Mark Thomas 2011-03-06 01:18:53 UTC
*** Bug 50877 has been marked as a duplicate of this bug. ***