Bug 7884 - Endless loop if EOF in CDATA section (InputEntity.unparsedContent)
Summary: Endless loop if EOF in CDATA section (InputEntity.unparsedContent)
Status: NEW
Alias: None
Product: Crimson
Classification: Unclassified
Component: other (show other bugs)
Version: 1.1.4
Hardware: All All
: P3 minor (vote)
Target Milestone: ---
Assignee: Edwin Goei
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2002-04-09 18:17 UTC by Clemens Klein-Robbenhaar
Modified: 2004-11-16 19:05 UTC (History)
1 user (show)



Attachments
test program which should trigger this bug (445 bytes, text/plain)
2002-04-09 18:20 UTC, Clemens Klein-Robbenhaar
Details
proposed patch (576 bytes, patch)
2002-04-09 18:20 UTC, Clemens Klein-Robbenhaar
Details | Diff
alternative patch (should make a smaller fix) (874 bytes, patch)
2002-04-11 14:16 UTC, Clemens Klein-Robbenhaar
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Clemens Klein-Robbenhaar 2002-04-09 18:17:49 UTC
If an XML input source contains an EOF in a CDATA section behind the 
closing ']]', but without the closing '>`, crimson goes into an infinite loop.

 To reproduce this bug, one could try to parse something like

   "<foo><![CDATA[foobar]]"

Sure this is not wellformed, but an endless loop is not the best possible
thing to react to such input one could imagine. I will attach a small sample 
program which triggers the bug.

 A short analysis of the problem shows that the program loops within
the org.apache.crimson.parser.InputEntity in the method "unparsedContent".

 The reason for the endless loop semms to be:
   - while scanning the visible part of the input in the CDATA section,
     the parser runs into a ']', and wants to check, if a `]>` is following.
     however, there are less than 2 characters following, thus
     if breaks out the inner for loop over the chars in the buffer, leaving
     the "last" cursor on the first ']'
   - the outer for loop sets "start=last"; thus the start cursor points to 
     the first ']' now.
   - afterwards it tries to read more characters into the buffer
     by calling "fillbuf", which most probably does nothing, because the second
     ']' is alrady read in, and there in nothing else to read.
   - then the outer loop checks for EOF; however it does so by testing
     "start >= finish", which is not true, because start points to the
      first ']' and finish points to the second ']', thus start == finish-1
   - guessing that EOF is not reached, the inner loop is entered again;
     reading the first ']'  bailes out of this loop, etc, ad infinitum

 I have a very small fix for that by testing not only "isEOF()" in the outer 
loop, but also testing, if the last "fillbuf" actually has read something, 
by inspecting the "isClosed" instance variable.
  This fixes the bug; however I an not sure if this may break something else.
Comment 1 Clemens Klein-Robbenhaar 2002-04-09 18:20:04 UTC
Created attachment 1509 [details]
test program which should trigger this bug
Comment 2 Clemens Klein-Robbenhaar 2002-04-09 18:20:58 UTC
Created attachment 1510 [details]
proposed patch
Comment 3 Clemens Klein-Robbenhaar 2002-04-11 14:16:13 UTC
Created attachment 1521 [details]
alternative patch (should make a smaller fix)