I use Beider Morse encoder with Solr. When it indexes a lot of documents using this encoder, the import time is multiplied by 30. So, I have decided to optimize the current implementation in the commons-codec.
Currently, I have created two patch. The first patch delete a "performance hack" about a subsequence cache. This cache doesn't optimize performance and after deleting it, you can win some milliseconds.
The second patch changes the storage of the rules in memory using a Map instead of List. With it, you can access to a rule directly with the beginning of pattern. This patch divide the encoding time by 2.
I will try to find more improvement. If you have any idea, please tell me it.