By default, WordDelimiterFilter assigns 'types' to each character (computed from Unicode Properties).
Based on these types and the options provided, it splits and concatenates text.
In some circumstances, you might need to tweak the behavior of how this works.
It seems the filter already had this in mind, since you can pass in a custom byte type table.
But its not exposed in the factory.
I think you should be able to customize the defaults with a configuration file: