[LANG-480] StringEscapeUtils.escapeHtml incorrectly converts unicode characters above U+00FFFF into 2 characters - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Minor
Resolution: Fixed
Affects Version/s: 2.4
Fix Version/s: 3.0
Component/s: lang.*
Labels:
None
Environment:

doesn't matter

Description

Characters that are represented as a 2 characters internaly by java are incorrectly converted by the function. The following test displays the problem quite nicely:

import org.apache.commons.lang.*;

public class J2 {
public static void main(String[] args) throws Exception {
// this is the utf8 representation of the character:
// COUNTING ROD UNIT DIGIT THREE
// in unicode
// codepoint: U+1D362
byte[] data = new byte[]

{ (byte)0xF0, (byte)0x9D, (byte)0x8D, (byte)0xA2 }

;

//output is: &#55348;&#57186;
// should be: 𝍢
System.out.println("'" + StringEscapeUtils.escapeHtml(new String(data, "UTF8")) + "'");
}
}

Should be very quick to fix, feel free to drop me an email if you want a patch.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

lang-480.patch
21/Jan/09 09:27
1 kB
Alexander Kjäll

Activity

People

Assignee:: Unassigned

Reporter:: Alexander Kjäll

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 20/Jan/09 17:36

Updated:: 17/Dec/09 03:41

Resolved:: 01/Mar/09 20:55