Thank you very much for contributing this, its true there is no factory for this feature.
I updated your code with a few tweaks:
- allow null dictionary. This allows the use of just the hyphenation grammar (
- allow encoding to be specified (but default to UTF-8). Some of the grammar distributions from offo dont use UTF-8 encoding.
- set onlyLongestMatch default to 'false'. this is just to be consistent with the TokenFilter itself, which defaults to false.
- added the Apache-licensed danish grammar to test-files, along with a small dictionary and some test cases.
if no one objects, i'll commit in a bit.