Uploaded image for project: 'Commons Codec'
  1. Commons Codec
  2. CODEC-174

Improve performance of Beider Morse encoder

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.6, 1.7
    • 1.9

    Description

      I use Beider Morse encoder with Solr. When it indexes a lot of documents using this encoder, the import time is multiplied by 30. So, I have decided to optimize the current implementation in the commons-codec.

      Currently, I have created two patch. The first patch delete a "performance hack" about a subsequence cache. This cache doesn't optimize performance and after deleting it, you can win some milliseconds.

      The second patch changes the storage of the rules in memory using a Map instead of List. With it, you can access to a rule directly with the beginning of pattern. This patch divide the encoding time by 2.

      I will try to find more improvement. If you have any idea, please tell me it.

      Attachments

        1. test-commons-codec-test-bm.zip
          1 kB
          Thomas Champagne
        2. TestCacheSubSequence.java
          2 kB
          Thomas Champagne
        3. CODEC-174-reuse-set-in-PhonemeBuilder.patch
          5 kB
          Thomas Champagne
        4. CODEC-174-refactor-restrictTo-method-in-SomeLanguages.patch
          1 kB
          Thomas Champagne
        5. CODEC-174-refactor-join-method-in-Phoneme.patch
          3 kB
          Thomas Champagne
        6. CODEC-174-delete-subsequence-cache-and-use-String.patch
          3 kB
          Thomas Champagne
        7. CODEC-174-delete-subsequence-cache.patch
          3 kB
          Thomas Champagne
        8. CODEC-174-convert-set-to-list-in-apply-method.patch
          2 kB
          Thomas Champagne
        9. CODEC-174-change-rules-storage-to-Map.patch
          8 kB
          Thomas Champagne
        10. CODEC_174_cleanup.patch
          5 kB
          Thomas Neidhart

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            lafeuil Thomas Champagne
            Votes:
            2 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment