Details
Description
When attempting to build Lucene, I discovered a problem with UTF8 decoding.
(this actually prevents our tests from even compiling without a workaround)
For any codepoint > 0xffff (4-byte utf-8 sequence), the decoder doesn't properly
split the decoded codepoint into surrogate pairs.