If an XML input source contains an EOF in a CDATA section behind the closing ']]', but without the closing '>`, crimson goes into an infinite loop. To reproduce this bug, one could try to parse something like "<foo><![CDATA[foobar]]" Sure this is not wellformed, but an endless loop is not the best possible thing to react to such input one could imagine. I will attach a small sample program which triggers the bug. A short analysis of the problem shows that the program loops within the org.apache.crimson.parser.InputEntity in the method "unparsedContent". The reason for the endless loop semms to be: - while scanning the visible part of the input in the CDATA section, the parser runs into a ']', and wants to check, if a `]>` is following. however, there are less than 2 characters following, thus if breaks out the inner for loop over the chars in the buffer, leaving the "last" cursor on the first ']' - the outer for loop sets "start=last"; thus the start cursor points to the first ']' now. - afterwards it tries to read more characters into the buffer by calling "fillbuf", which most probably does nothing, because the second ']' is alrady read in, and there in nothing else to read. - then the outer loop checks for EOF; however it does so by testing "start >= finish", which is not true, because start points to the first ']' and finish points to the second ']', thus start == finish-1 - guessing that EOF is not reached, the inner loop is entered again; reading the first ']' bailes out of this loop, etc, ad infinitum I have a very small fix for that by testing not only "isEOF()" in the outer loop, but also testing, if the last "fillbuf" actually has read something, by inspecting the "isClosed" instance variable. This fixes the bug; however I an not sure if this may break something else.
Created attachment 1509 [details] test program which should trigger this bug
Created attachment 1510 [details] proposed patch
Created attachment 1521 [details] alternative patch (should make a smaller fix)