Hmm, testStolenBytes should be using the 0x1f byte ... the intention
of the test is to ensure than an incoming token that contains
SEP_LABEL still works correctly (i.e., that the escaping we do is
When I change the 0xff in the patch back to 0x1f I indeed see the
(unexpected) failure without the PRESERVE_SEP option, which is curious
because we do no escaping without PRESERVE_SEP.
OK I see the issue: before, when POS_SEP was 256 and the input space
was a byte, replaceSep always worked correctly because there was no
way for any byte input to be confused with POS_SEP. But now that we
are increasing the input space to all unicode chars, there is not
"safe" value for POS_SEP.
OK given all this I think we should stop trying to not-steal the byte:
I think we should simply declare we steal both 0x1e and 0x1f. This
means we can remove the escaping code, put back your previous code
that I had asked you to remove (sorry) that threw IAE on 0x1f (and now
also 0x1e), remove testStolenBytes, and then improve your new
testIllegalLookupArgument to also verify 0x1f gets the
Also, we could maybe eliminate some code dup here, e.g. the two
toFiniteStrings ... maybe by having TS2A and TS2UA share a base class
/ interface. Hmm, maybe we should just merge TS2UA back into TS2A,
and add a unicodeAware option to it?