Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.4.1
    • Fix Version/s: 3.5, 4.0-ALPHA
    • Component/s: Schema and Analysis
    • Labels:
      None
    • Environment:

      Apache Commons Codec 1.5

      Description

      As soon as Apache Commons Codec 1.5 is released, support new encoder "ColognePhonetic" please.
      See JIRA for CODEC-106.

      It is fundamental for phonetic searches if you are indexing german names. Other indexers are optimizied for english (words).

        Issue Links

          Activity

          Karsten Tinnefeld made changes -
          Link This issue is related to SOLR-3289 [ SOLR-3289 ]
          Uwe Schindler made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Hide
          Uwe Schindler added a comment -

          Bulk close after 3.5 is released

          Show
          Uwe Schindler added a comment - Bulk close after 3.5 is released
          rmuir committed 1195101 (1 file)
          Robert Muir made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Fix Version/s 3.5 [ 12317876 ]
          Resolution Fixed [ 1 ]
          Hide
          Robert Muir added a comment -

          Thanks Marc!

          Show
          Robert Muir added a comment - Thanks Marc!
          Marc Pompl made changes -
          Hide
          Marc Pompl added a comment -

          Thanks for the useful hints. Java's API always has some nifty news, even for experienced developers.
          Code was modified.
          Besides, the exception handling is really ugly, in order to fulfill the requirement to do reflection twice.

          Show
          Marc Pompl added a comment - Thanks for the useful hints. Java's API always has some nifty news, even for experienced developers. Code was modified. Besides, the exception handling is really ugly, in order to fulfill the requirement to do reflection twice.
          Marc Pompl made changes -
          Attachment ColognePhonetic-patch-with-reflection.txt [ 12475950 ]
          Hide
          Uwe Schindler added a comment -

          One addition about the registry: The registry is not synchronized, but now it is lazy updated as soon as a new encoder class occurs - registry.put() could be called from different threads and break the HashMap.

          Show
          Uwe Schindler added a comment - One addition about the registry: The registry is not synchronized, but now it is lazy updated as soon as a new encoder class occurs - registry.put() could be called from different threads and break the HashMap.
          Hide
          Robert Muir added a comment -

          hi Marc, thanks for notifying us of the commons codec release.

          Uwe: I agree with your suggestions, I just want to mention i think we should still keep the registry map, because for example Caverphone is deprecated in the 1.5 release (and there is Caverphone1 and Caverphone2). So the map is useful for us to shield our users from changes (we can map Caverphone to Caverphone2 or whichever one is equivalent). Even if this one is removed in commons codec 1.6 this string value can still work.

          Show
          Robert Muir added a comment - hi Marc, thanks for notifying us of the commons codec release. Uwe: I agree with your suggestions, I just want to mention i think we should still keep the registry map, because for example Caverphone is deprecated in the 1.5 release (and there is Caverphone1 and Caverphone2). So the map is useful for us to shield our users from changes (we can map Caverphone to Caverphone2 or whichever one is equivalent). Even if this one is removed in commons codec 1.6 this string value can still work.
          Hide
          Uwe Schindler added a comment - - edited

          Hi Marc,
          to remove the unneeded cast leading to an unchecked warning, also calling static methods on instances is a no-go:
          Replace:

          clazz = (Class<? extends Encoder>) this.getClass().forName(name);
          

          By:

          clazz = Class.forName(name).asSubclass(Encoder.class);
          

          Aditionally, maybe the reflection should try twice:

          • First try by prepending the package name ("org.apache.commons.codec.language."), as the user expects the encoder in this package. With your code, the user has to add the full class name.
          • Use name param as "full class name".

          The only problem with the reflection code is that it is case sensitive, so this is different from the "registry" map.

          Show
          Uwe Schindler added a comment - - edited Hi Marc, to remove the unneeded cast leading to an unchecked warning, also calling static methods on instances is a no-go: Replace: clazz = ( Class <? extends Encoder>) this .getClass().forName(name); By: clazz = Class .forName(name).asSubclass(Encoder.class); Aditionally, maybe the reflection should try twice: First try by prepending the package name ("org.apache.commons.codec.language."), as the user expects the encoder in this package. With your code, the user has to add the full class name. Use name param as "full class name". The only problem with the reflection code is that it is case sensitive, so this is different from the "registry" map.
          Marc Pompl made changes -
          Attachment ColognePhonetic-patch-with-reflection.txt [ 12475950 ]
          Marc Pompl made changes -
          Attachment ColognePhonetic-patch-with-reflection.txt [ 12475949 ]
          Marc Pompl made changes -
          Attachment ColognePhonetic-patch-with-reflection.txt [ 12475949 ]
          Marc Pompl made changes -
          Attachment ColognePhonetic-patch-with-reflection.txt [ 12475948 ]
          Marc Pompl made changes -
          Attachment ColognePhonetic-patch.txt [ 12465897 ]
          Marc Pompl made changes -
          Attachment ColognePhonetic-patch-with-reflection.txt [ 12475948 ]
          Hide
          Marc Pompl added a comment - - edited

          Hi Robert,

          at last I've found some time to implement your suggestion. The modified patch is attached.
          JUnit tests are updated and included.

          Oh, by the way, Apache Commons Codec 1.5 has been released on March, 29th.

          Show
          Marc Pompl added a comment - - edited Hi Robert, at last I've found some time to implement your suggestion. The modified patch is attached. JUnit tests are updated and included. Oh, by the way, Apache Commons Codec 1.5 has been released on March, 29th.
          Hide
          Marc Pompl added a comment - - edited

          Hi Robert,

          I wasn't able to find any release date for Apache Commons Codec 1.4.1 or 1.5.

          I agree with your concept of supporting a static map and--if a not supported string is identified--a fallback via reflection, additionally.

          Show
          Marc Pompl added a comment - - edited Hi Robert, I wasn't able to find any release date for Apache Commons Codec 1.4.1 or 1.5. I agree with your concept of supporting a static map and-- if a not supported string is identified --a fallback via reflection, additionally.
          Hide
          Robert Muir added a comment -

          Seems ColognePhonetic will be supported in Apache Commons Codec 1.4.1.

          Thanks for your patch Marc. Has there been any discussion on a tentative release date for 1.4.1?
          When this happens I'll be happy to add it.

          Besides, do you think it is a good idea to allow a fully qualified class name as "encoder" in PhoneticFilterFactory? Extending solr by a custom phonetic filter could be much easier for developers.

          I think the reason its not done with reflection might be historical, before all tokenstreams were reused?
          But I still think its a good idea to avoid reflection when possible, so I think we should keep the statically built map.

          However, if you supply a string thats not in this map, I don't think it would hurt to try to reflect the name
          before throwing an exception, as in this case you would only get an exception anyway.

          Show
          Robert Muir added a comment - Seems ColognePhonetic will be supported in Apache Commons Codec 1.4.1. Thanks for your patch Marc. Has there been any discussion on a tentative release date for 1.4.1? When this happens I'll be happy to add it. Besides, do you think it is a good idea to allow a fully qualified class name as "encoder" in PhoneticFilterFactory? Extending solr by a custom phonetic filter could be much easier for developers. I think the reason its not done with reflection might be historical, before all tokenstreams were reused? But I still think its a good idea to avoid reflection when possible, so I think we should keep the statically built map. However, if you supply a string thats not in this map, I don't think it would hurt to try to reflect the name before throwing an exception, as in this case you would only get an exception anyway.
          Marc Pompl made changes -
          Description As soon as Apache Commons Codec 1.5 is released, support new encoder "ColognePhonetic" please.
          See JIRA for CODEC-106.

          It is fundamental for phonetic searches if you are indexing german words. Other indexers are optimizied for english.
          As soon as Apache Commons Codec 1.5 is released, support new encoder "ColognePhonetic" please.
          See JIRA for CODEC-106.

          It is fundamental for phonetic searches if you are indexing german names. Other indexers are optimizied for english (words).
          Marc Pompl made changes -
          Description As soon as Apache Commons Codec 1.5 is released, support new encoder "ColognePhonetic".
          See JIRA for CODEC-106.

          It is fundamental for phonetic searches if you are indexing german words. Other indexers are optimizied for english.
          As soon as Apache Commons Codec 1.5 is released, support new encoder "ColognePhonetic" please.
          See JIRA for CODEC-106.

          It is fundamental for phonetic searches if you are indexing german words. Other indexers are optimizied for english.
          Fix Version/s 4.0 [ 12314992 ]
          Fix Version/s 1.5 [ 12313566 ]
          Marc Pompl made changes -
          Link This issue is blocked by CODEC-106 [ CODEC-106 ]
          Marc Pompl made changes -
          Field Original Value New Value
          Attachment ColognePhonetic-patch.txt [ 12465897 ]
          Hide
          Marc Pompl added a comment - - edited

          Patch for solr trunk (4.0). Binary modifications are not included.
          Seems ColognePhonetic will be supported in Apache Commons Codec 1.4.1.

          Besides, do you think it is a good idea to allow a fully qualified class name as "encoder" in PhoneticFilterFactory? Extending solr by a custom phonetic filter could be much easier for developers.

          Show
          Marc Pompl added a comment - - edited Patch for solr trunk (4.0). Binary modifications are not included. Seems ColognePhonetic will be supported in Apache Commons Codec 1.4.1. Besides, do you think it is a good idea to allow a fully qualified class name as "encoder" in PhoneticFilterFactory? Extending solr by a custom phonetic filter could be much easier for developers.
          Marc Pompl created issue -

            People

            • Assignee:
              Unassigned
              Reporter:
              Marc Pompl
            • Votes:
              1 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - 2h
                2h
                Remaining:
                Remaining Estimate - 2h
                2h
                Logged:
                Time Spent - Not Specified
                Not Specified

                  Development