Uploaded image for project: 'Commons Codec'
  1. Commons Codec
  2. CODEC-174

Improve performance of Beider Morse encoder

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.6, 1.7
    • 1.9

    Description

      I use Beider Morse encoder with Solr. When it indexes a lot of documents using this encoder, the import time is multiplied by 30. So, I have decided to optimize the current implementation in the commons-codec.

      Currently, I have created two patch. The first patch delete a "performance hack" about a subsequence cache. This cache doesn't optimize performance and after deleting it, you can win some milliseconds.

      The second patch changes the storage of the rules in memory using a Map instead of List. With it, you can access to a rule directly with the beginning of pattern. This patch divide the encoding time by 2.

      I will try to find more improvement. If you have any idea, please tell me it.

      Attachments

        1. CODEC_174_cleanup.patch
          5 kB
          Thomas Neidhart
        2. CODEC-174-change-rules-storage-to-Map.patch
          8 kB
          Thomas Champagne
        3. CODEC-174-convert-set-to-list-in-apply-method.patch
          2 kB
          Thomas Champagne
        4. CODEC-174-delete-subsequence-cache.patch
          3 kB
          Thomas Champagne
        5. CODEC-174-delete-subsequence-cache-and-use-String.patch
          3 kB
          Thomas Champagne
        6. CODEC-174-refactor-join-method-in-Phoneme.patch
          3 kB
          Thomas Champagne
        7. CODEC-174-refactor-restrictTo-method-in-SomeLanguages.patch
          1 kB
          Thomas Champagne
        8. CODEC-174-reuse-set-in-PhonemeBuilder.patch
          5 kB
          Thomas Champagne
        9. TestCacheSubSequence.java
          2 kB
          Thomas Champagne
        10. test-commons-codec-test-bm.zip
          1 kB
          Thomas Champagne

        Issue Links

          Activity

            People

              Unassigned Unassigned
              lafeuil Thomas Champagne
              Votes:
              2 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: