Solr
  1. Solr
  2. SOLR-2393

Clean up Double Metaphone code (PhoneticFilterFactory and DoubleMetaphoneFilter)

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 4.0-ALPHA
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      From Ryan McKinley to Jan Høydahl:

      PhoneticFilterFactory could check
      "DoubleMetaphone".equals( encoder ) and then create the specialized
      Filter.

      I don't feel strongly, but we could have:
      EncoderFilter – just uses 'encode()'
      DoubleMetaphoneFilter - uses special double metaphone stuff

      EncoderFilterFactory – always uses EncoderFilter, and is not
      semantically bound to 'phonetic'
      PhoneticFilterFactory – picks the best phonetic filter impl (encoder
      or double metaphone)

      then deprecate:
      PhoneticFilter
      DoubleMetaphoneFilterFactory

        Activity

        Hide
        Bill Bell added a comment -

        Can someone please review my patch to see if it makes sense... ?

        I am not an expert at these Phonetic filters. I think I nailed it, and it appears to work well.

        Show
        Bill Bell added a comment - Can someone please review my patch to see if it makes sense... ? I am not an expert at these Phonetic filters. I think I nailed it, and it appears to work well.
        Hide
        Bill Bell added a comment -

        Done.

        Show
        Bill Bell added a comment - Done.
        Hide
        Jan Høydahl added a comment -

        Hi, can you re-upload the patch with name SOLR-2393.patch ?

        Show
        Jan Høydahl added a comment - Hi, can you re-upload the patch with name SOLR-2393 .patch ?
        Hide
        Bill Bell added a comment -

        schema.phonetic.xml can be used for testing

        Once this is good I'll deprecate solr/core/src/java/org/apache/solr/analysis/DoubleMetaphoneFilterFactory.java and fix the schema.xml test cases.

        Show
        Bill Bell added a comment - schema.phonetic.xml can be used for testing Once this is good I'll deprecate solr/core/src/java/org/apache/solr/analysis/DoubleMetaphoneFilterFactory.java and fix the schema.xml test cases.
        Hide
        Jan Høydahl added a comment -

        I think the problem is that PhoneticFilterFactory does not handle DoubleMetaphone correctly, and to avoid confusion we should upgrade PhoneticFilterFactory to do DoubleMetaphone correctly and then deprecate DoubleMetaphoneFilterFactory.

        Show
        Jan Høydahl added a comment - I think the problem is that PhoneticFilterFactory does not handle DoubleMetaphone correctly, and to avoid confusion we should upgrade PhoneticFilterFactory to do DoubleMetaphone correctly and then deprecate DoubleMetaphoneFilterFactory.
        Hide
        Bill Bell added a comment -

        Jan...

        So we would switch from -

        <filter class="solr.PhoneticFilterFactory" encoder="DoubleMetaphone" inject="false"/> 
        or
        <filter class="solr.DoubleMetaphoneFilterFactory" inject="false"/> 
        

        to the following...

        So we would have these 2 cases? Is that how you understand it?

        
        <fieldtype name="phonetic" stored="false" indexed="true" class="solr.TextField" > 
              <analyzer> 
                <tokenizer class="solr.StandardTokenizerFactory"/> 
                <filter class="solr.PhoneticFilterFactory" encoder="Phonetic" inject="false"/> 
              </analyzer> 
        </fieldtype> 
        
        
        <fieldtype name="phonetic" stored="false" indexed="true" class="solr.TextField" > 
              <analyzer> 
                <tokenizer class="solr.StandardTokenizerFactory"/> 
                <filter class="solr.PhoneticFilterFactory" encoder="DoubleMetaphone" inject="false"/> 
              </analyzer> 
        </fieldtype> 
        
        Show
        Bill Bell added a comment - Jan... So we would switch from - <filter class= "solr.PhoneticFilterFactory" encoder= "DoubleMetaphone" inject= " false " /> or <filter class= "solr.DoubleMetaphoneFilterFactory" inject= " false " /> to the following... So we would have these 2 cases? Is that how you understand it? <fieldtype name= "phonetic" stored= " false " indexed= " true " class= "solr.TextField" > <analyzer> <tokenizer class= "solr.StandardTokenizerFactory" /> <filter class= "solr.PhoneticFilterFactory" encoder= "Phonetic" inject= " false " /> </analyzer> </fieldtype> <fieldtype name= "phonetic" stored= " false " indexed= " true " class= "solr.TextField" > <analyzer> <tokenizer class= "solr.StandardTokenizerFactory" /> <filter class= "solr.PhoneticFilterFactory" encoder= "DoubleMetaphone" inject= " false " /> </analyzer> </fieldtype>
        Hide
        Jan Høydahl added a comment -

        Hmm, was looking at phonetics a bit and found this issue. This should not be too hard, should it?
        Bill, would you like to attempt a first patch?

        Show
        Jan Høydahl added a comment - Hmm, was looking at phonetics a bit and found this issue. This should not be too hard, should it? Bill, would you like to attempt a first patch?

          People

          • Assignee:
            Unassigned
            Reporter:
            Bill Bell
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:

              Development