[CODEC-174] Improve performance of Beider Morse encoder - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 1.6, 1.7
Fix Version/s: 1.9
Labels:
- patch
- performance

Description

I use Beider Morse encoder with Solr. When it indexes a lot of documents using this encoder, the import time is multiplied by 30. So, I have decided to optimize the current implementation in the commons-codec.

Currently, I have created two patch. The first patch delete a "performance hack" about a subsequence cache. This cache doesn't optimize performance and after deleting it, you can win some milliseconds.

The second patch changes the storage of the rules in memory using a Map instead of List. With it, you can access to a rule directly with the beginning of pattern. This patch divide the encoding time by 2.

I will try to find more improvement. If you have any idea, please tell me it.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

test-commons-codec-test-bm.zip
04/Nov/13 16:06
1 kB
Thomas Champagne
TestCacheSubSequence.java
12/Nov/13 14:02
2 kB
Thomas Champagne
CODEC-174-reuse-set-in-PhonemeBuilder.patch
05/Nov/13 13:38
5 kB
Thomas Champagne
CODEC-174-refactor-restrictTo-method-in-SomeLanguages.patch
12/Nov/13 14:28
1 kB
Thomas Champagne
CODEC-174-refactor-join-method-in-Phoneme.patch
12/Nov/13 09:13
3 kB
Thomas Champagne
CODEC-174-delete-subsequence-cache-and-use-String.patch
08/Nov/13 15:48
3 kB
Thomas Champagne
CODEC-174-delete-subsequence-cache.patch
04/Nov/13 15:12
3 kB
Thomas Champagne
CODEC-174-convert-set-to-list-in-apply-method.patch
14/Nov/13 15:00
2 kB
Thomas Champagne
CODEC-174-change-rules-storage-to-Map.patch
04/Nov/13 15:13
8 kB
Thomas Champagne
CODEC_174_cleanup.patch
08/Nov/13 15:53
5 kB
Thomas Neidhart

Issue Links

is related to

SOLR-5613 Upgrade Apache Commons Codec to version 1.9 in order to improve performance of BeiderMorseFilter

Closed

relates to

CODEC-125 Implement a Beider-Morse phonetic matching codec

Closed

Activity

People

Assignee:: Unassigned

Reporter:: Thomas Champagne

Votes:: 2 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 04/Nov/13 15:10

Updated:: 07/Jan/14 09:07

Resolved:: 21/Dec/13 02:56