Steven, even without >BMP support, 1.5 branch would make the grammar file more clear and maintainable.
Otherwise, codepoint ranges must be used.
I'll take your advice and send the nudge.
I think for this issue it would be best to wait for the 1.5.0 version of jflex for clarity.
I think even without >BMP support, we should be able to still function.
ex: surrogate pairs with lead surrogate D840-D87F point to the SIP, so they should be typed as CJK.
for reference (haven't looked at jflex), above-bmp support might require new data structures. I think ICU uses things like tries / compactarrays to deal with the fact you have thousands of codepoints with the same property value, etc.