I've spent a bunch of time playing around with this. I discovered that the "bugs" with the new encoder/decoder only occur when decoding invalid UTF-8 strings. In particular, the Java standard decoder seems to take a lot more care with validating every byte it examines before combining it into the codepoint and converting it to UTF-16.
While the decoding function I have right now is still about 2x the speed of Java's, adding in all the checks required to get us to parity of functionality will almost certainly evaporate the performance benefits. Even worse, it seems like the performance benefits of encoding have somehow disappeared during the process of implemented surrogate pair support. (This one confuses me - my benchmark doesn't even cover a string that contains surrogate pairs. Maybe my original tests were flawed?)
The bottom line is that it looks like this is a dead end, unless we are willing to sacrifice "correctness" when decoding invalid UTF-8 encoded strings. You could argue that if it's a bad encoding already, it might be best to detect that and throw rather than silently convert like Java does, but that's a debate for another time.
The only way that reviving this issue would make sense is if we are willing to support a separate encoding mechanism for the purpose of avoiding buffer allocation and copies during write. At the moment, we're not equipped to benefit from that, so maybe I'll reevaluate later.