Bug 3303 - Unicode 3.0 character \\uFFFD
Summary: Unicode 3.0 character \\uFFFD
Status: CLOSED FIXED
Alias: None
Product: Regexp
Classification: Unclassified
Component: Other (show other bugs)
Version: unspecified
Hardware: PC All
: P3 minor (vote)
Target Milestone: ---
Assignee: Jakarta Notifications Mailing List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2001-08-28 06:17 UTC by Tasuki Yamamoto
Modified: 2005-03-20 17:06 UTC (History)
0 users



Attachments
Suggested fix for this bug. (5.55 KB, patch)
2003-10-07 07:34 UTC, Oleg Sukhodolsky
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Tasuki Yamamoto 2001-08-28 06:17:48 UTC
http://www.unicode.org/Public/3.0-Update/UnicodeData-3.0.0.txt:
>FFFD;REPLACEMENT CHARACTER;So;0;ON;;;;;N;;;;;

For some reason when the above character is in any regex character class it 
causes a RESyntaxException with description 'Bad Character Class'. I attempted 
to use it in the following context:

  private static String XMLescape(String s)
  	throws RESyntaxException
  {
	if (s==null) return s;
	if (s.length() == 0) return s;

	// XML 1.0 standard actually says:
	// Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | 
[#x10000-10FFFF]
	// For some reason this library doesn't like the Unicode character 
\\uFFFD.
	RE r = new RE("[^\\u0009\\u0010\\u0013\\u0020-\\uD7FF\\uE000-\\uFFFC]");

	return r.subst(s, "");
  }

I'm using the JRE Standard Edition 3.0.

Regards,

Tasuki.
Comment 1 Oleg Sukhodolsky 2003-10-07 03:23:16 UTC
The cause of the problem is that RECompiler uses 0xfffd as value of its 
internal constant ESC_CLASS.

To fix the problem type of ESC_XXX constants should be changed from
char to int.  Thier values should be bigger than maximum value of char.
and return type of escape() method should be changed to int.
Comment 2 Oleg Sukhodolsky 2003-10-07 07:34:49 UTC
Created attachment 8476 [details]
Suggested fix for this bug.
Comment 3 Vadim Gritsenko 2003-12-20 17:59:23 UTC
Patch applied, thanks
Comment 4 Vadim Gritsenko 2003-12-20 17:59:43 UTC
Closed