Details
-
New Feature
-
Status: Closed
-
Minor
-
Resolution: Fixed
-
1.4
-
None
Description
Microsoft tools have the unpleasant habit of writing a byte order mark (the three-byte sequence 0xEF 0xBB 0xBF) at the start of a UTF-8 encoded file.
The CharsetDecoder supplied with the JDK does not simply discard these bytes, but instead returns the BOM character (0xFEFF); see http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6378911 for discussion on this.
This makes life unpleasant for anyone who is processing text data, as the program must look for this character and ignore it.
The BOMExclusionInputStream class is a work-around: it recognizes the BOM at the start of the stream, and skips over it.
Attachments
Attachments
Issue Links
- is related to
-
IO-162 add Xml(Stream)Reader/Writer from ROME
- Closed