Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Invalid
    • Affects Version/s: 3.5
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Environment:

      Tomcat 7.0.25 (request encoding UTF-8)
      Solr 3.5.0
      Java 7 Oracle
      Ubuntu 11.10

      Description

      Sorry for inaccurate title.
      I have a 3 fields (dc_title, dc_title_unicode, dc_unicode_full) containing same value:

      <title xmlns="http://www.tei-c.org/ns/1.0">cal•lígraf</title>
      

      and these fields are configured accordingly:

          <fieldType name="xml" class="solr.TextField" positionIncrementGap="100">
            <analyzer type="index">
              <charFilter class="solr.HTMLStripCharFilterFactory"/>
              <tokenizer class="solr.StandardTokenizerFactory"/>
              <filter class="solr.ICUFoldingFilterFactory"/>
            </analyzer>
            <analyzer type="query">
              <tokenizer class="solr.StandardTokenizerFactory"/>
              <filter class="solr.ICUFoldingFilterFactory"/>
            </analyzer>
          </fieldType>
          
          <fieldType name="xml_unicode" class="solr.TextField" positionIncrementGap="100">
            <analyzer type="index">
              <charFilter class="solr.HTMLStripCharFilterFactory"/>
              <tokenizer class="solr.StandardTokenizerFactory"/>
            </analyzer>
            <analyzer type="query">
              <tokenizer class="solr.WhitespaceTokenizerFactory"/>
            </analyzer>
          </fieldType>
          
          <fieldType name="xml_unicode_full" class="solr.TextField" positionIncrementGap="100">
            <analyzer type="index">
              <charFilter class="solr.HTMLStripCharFilterFactory"/>
              <tokenizer class="solr.WhitespaceTokenizerFactory"/>
            </analyzer>
            <analyzer type="query">
              <tokenizer class="solr.WhitespaceTokenizerFactory"/>
            </analyzer>
          </fieldType>
      

      And finally my search configuration:

          <requestHandler name="dictionary" class="solr.SearchHandler">
               <lst name="defaults">
                 <str name="echoParams">all</str>
                 <str name="defType">edismax</str>
                 <str name="mm">2&lt;-25%</str>
                 <str name="qf">dc_title_unicode_full^2 dc_title_unicode^2 dc_title</str>
                 <int name="rows">10</int>
                 <str name="spellcheck.onlyMorePopular">true</str>
                 <str name="spellcheck.extendedResults">false</str>
                 <str name="spellcheck.count">1</str>
               </lst>
              <arr name="last-components">
                <str>spellcheck</str>
              </arr>
          </requestHandler>
      

      I am trying to match the field with various search phrases (that are valid). There are results:

      # search phrase match? Comment
      1 cal•lígra?  
      2 cal•ligra? Changed í to i
      3 cal•ligraf  
      4 calligra?  

      The problem is the #2 attempt to match a data. The #3 works replacing ? with f.

      One more thing. If * is used insted of ? other data is matched as cal•lígrafia but not cal•lígraf...

        Activity

        Hide
        Dalius added a comment -
        Show
        Dalius added a comment - Moved to mailing list: http://lucene.472066.n3.nabble.com/Wildcard-issue-td3726345.html
        Dalius made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Hide
        Dalius added a comment -

        Thanks for references.

        Show
        Dalius added a comment - Thanks for references.
        Erick Erickson made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Invalid [ 6 ]
        Hide
        Erick Erickson added a comment -

        Please re-submit this on the users list "Core User List" here: http://lucene.apache.org/solr/discussion.html

        JIRAs are intended for tracking code issues rather than usage questions.

        Besides the information you provided, the result of attaching &debugQuery=on to the problematical case would be helpful.

        Show
        Erick Erickson added a comment - Please re-submit this on the users list "Core User List" here: http://lucene.apache.org/solr/discussion.html JIRAs are intended for tracking code issues rather than usage questions. Besides the information you provided, the result of attaching &debugQuery=on to the problematical case would be helpful.
        Dalius made changes -
        Description Sorry for inaccurate title.
        I have a 3 fields (dc_title, dc_title_unicode, dc_unicode_full) containing same value:
        {code}
        <title xmlns="http://www.tei-c.org/ns/1.0">cal•lígraf</title>
        {code}
        and these fields are configured accordingly:
        {code}
            <fieldType name="xml" class="solr.TextField" positionIncrementGap="100">
              <analyzer type="index">
                <charFilter class="solr.HTMLStripCharFilterFactory"/>
                <tokenizer class="solr.StandardTokenizerFactory"/>
                <filter class="solr.ICUFoldingFilterFactory"/>
              </analyzer>
              <analyzer type="query">
                <tokenizer class="solr.StandardTokenizerFactory"/>
                <filter class="solr.ICUFoldingFilterFactory"/>
              </analyzer>
            </fieldType>
            
            <fieldType name="xml_unicode" class="solr.TextField" positionIncrementGap="100">
              <analyzer type="index">
                <charFilter class="solr.HTMLStripCharFilterFactory"/>
                <tokenizer class="solr.StandardTokenizerFactory"/>
              </analyzer>
              <analyzer type="query">
                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
              </analyzer>
            </fieldType>
            
            <fieldType name="xml_unicode_full" class="solr.TextField" positionIncrementGap="100">
              <analyzer type="index">
                <charFilter class="solr.HTMLStripCharFilterFactory"/>
                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
              </analyzer>
              <analyzer type="query">
                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
              </analyzer>
            </fieldType>
        {code}

        And finally my search configuration:
        {code}
            <requestHandler name="dictionary" class="solr.SearchHandler">
                 <lst name="defaults">
                   <str name="echoParams">all</str>
                   <str name="defType">edismax</str>
                   <str name="mm">2&lt;-25%</str>
                   <str name="qf">dc_title_unicode_full^2 dc_title_unicode^2 dc_title</str>
                   <int name="rows">10</int>
                   <str name="spellcheck.onlyMorePopular">true</str>
                   <str name="spellcheck.extendedResults">false</str>
                   <str name="spellcheck.count">1</str>
                 </lst>
                <arr name="last-components">
                  <str>spellcheck</str>
                </arr>
            </requestHandler>
        {code}

        I am trying to match the field with various search phrases (that are valid). There are results:
        || # || search phrase || match? ||
        | 1 | cal•lígra? | (/) |
        | 2 | cal•ligra? | (x) |
        | 3 | cal•ligraf | (/) |
        | 4 | calligra? | (/) |

        The problem is the #2 attempt to match a data. The #3 works replacing ? with f.

        One more thing. If * is used insted of ? other data is matched as cal•lígrafia but not cal•lígraf...
        Sorry for inaccurate title.
        I have a 3 fields (dc_title, dc_title_unicode, dc_unicode_full) containing same value:
        {code}
        <title xmlns="http://www.tei-c.org/ns/1.0">cal•lígraf</title>
        {code}
        and these fields are configured accordingly:
        {code}
            <fieldType name="xml" class="solr.TextField" positionIncrementGap="100">
              <analyzer type="index">
                <charFilter class="solr.HTMLStripCharFilterFactory"/>
                <tokenizer class="solr.StandardTokenizerFactory"/>
                <filter class="solr.ICUFoldingFilterFactory"/>
              </analyzer>
              <analyzer type="query">
                <tokenizer class="solr.StandardTokenizerFactory"/>
                <filter class="solr.ICUFoldingFilterFactory"/>
              </analyzer>
            </fieldType>
            
            <fieldType name="xml_unicode" class="solr.TextField" positionIncrementGap="100">
              <analyzer type="index">
                <charFilter class="solr.HTMLStripCharFilterFactory"/>
                <tokenizer class="solr.StandardTokenizerFactory"/>
              </analyzer>
              <analyzer type="query">
                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
              </analyzer>
            </fieldType>
            
            <fieldType name="xml_unicode_full" class="solr.TextField" positionIncrementGap="100">
              <analyzer type="index">
                <charFilter class="solr.HTMLStripCharFilterFactory"/>
                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
              </analyzer>
              <analyzer type="query">
                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
              </analyzer>
            </fieldType>
        {code}

        And finally my search configuration:
        {code}
            <requestHandler name="dictionary" class="solr.SearchHandler">
                 <lst name="defaults">
                   <str name="echoParams">all</str>
                   <str name="defType">edismax</str>
                   <str name="mm">2&lt;-25%</str>
                   <str name="qf">dc_title_unicode_full^2 dc_title_unicode^2 dc_title</str>
                   <int name="rows">10</int>
                   <str name="spellcheck.onlyMorePopular">true</str>
                   <str name="spellcheck.extendedResults">false</str>
                   <str name="spellcheck.count">1</str>
                 </lst>
                <arr name="last-components">
                  <str>spellcheck</str>
                </arr>
            </requestHandler>
        {code}

        I am trying to match the field with various search phrases (that are valid). There are results:
        || \# || search phrase || match? || Comment ||
        | 1 | cal•lígra? | (/) | |
        | 2 | cal•ligra? | (x) | Changed í to i |
        | 3 | cal•ligraf | (/) | |
        | 4 | calligra? | (/) | |

        The problem is the #2 attempt to match a data. The #3 works replacing ? with f.

        One more thing. If * is used insted of ? other data is matched as cal•lígrafia but not cal•lígraf...
        Dalius made changes -
        Description Sorry for inaccurate title.
        I have a 3 fields containing same value:
        {code}
        <title xmlns="http://www.tei-c.org/ns/1.0">cal•lígraf</title>
        {code}
        and these fields are configured accordingly:
        {code}
            <fieldType name="xml" class="solr.TextField" positionIncrementGap="100">
              <analyzer type="index">
                <charFilter class="solr.HTMLStripCharFilterFactory"/>
                <tokenizer class="solr.StandardTokenizerFactory"/>
                <filter class="solr.ICUFoldingFilterFactory"/>
              </analyzer>
              <analyzer type="query">
                <tokenizer class="solr.StandardTokenizerFactory"/>
                <filter class="solr.ICUFoldingFilterFactory"/>
              </analyzer>
            </fieldType>
            
            <fieldType name="xml_unicode" class="solr.TextField" positionIncrementGap="100">
              <analyzer type="index">
                <charFilter class="solr.HTMLStripCharFilterFactory"/>
                <tokenizer class="solr.StandardTokenizerFactory"/>
              </analyzer>
              <analyzer type="query">
                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
              </analyzer>
            </fieldType>
            
            <fieldType name="xml_unicode_full" class="solr.TextField" positionIncrementGap="100">
              <analyzer type="index">
                <charFilter class="solr.HTMLStripCharFilterFactory"/>
                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
              </analyzer>
              <analyzer type="query">
                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
              </analyzer>
            </fieldType>
        {code}

        And finally my search configuration:
        {code}
            <requestHandler name="dictionary" class="solr.SearchHandler">
                 <lst name="defaults">
                   <str name="echoParams">all</str>
                   <str name="defType">edismax</str>
                   <str name="mm">2&lt;-25%</str>
                   <str name="qf">dc_title_unicode_full^2 dc_title_unicode^2 dc_title</str>
                   <int name="rows">10</int>
                   <str name="spellcheck.onlyMorePopular">true</str>
                   <str name="spellcheck.extendedResults">false</str>
                   <str name="spellcheck.count">1</str>
                 </lst>
                <arr name="last-components">
                  <str>spellcheck</str>
                </arr>
            </requestHandler>
        {code}

        I am trying to match the field with various search phrases (that are valid). There are results:
        || # || search phrase || match? ||
        | 1 | cal•lígra? | (/) |
        | 2 | cal•ligra? | (x) |
        | 3 | cal•ligraf | (/) |
        | 4 | calligra? | (/) |

        The problem is the #2 attempt to match a data. The #3 works replacing ? with f.

        One more thing. If * is used insted of ? other data is matched as cal•lígrafia but not cal•lígraf...
        Sorry for inaccurate title.
        I have a 3 fields (dc_title, dc_title_unicode, dc_unicode_full) containing same value:
        {code}
        <title xmlns="http://www.tei-c.org/ns/1.0">cal•lígraf</title>
        {code}
        and these fields are configured accordingly:
        {code}
            <fieldType name="xml" class="solr.TextField" positionIncrementGap="100">
              <analyzer type="index">
                <charFilter class="solr.HTMLStripCharFilterFactory"/>
                <tokenizer class="solr.StandardTokenizerFactory"/>
                <filter class="solr.ICUFoldingFilterFactory"/>
              </analyzer>
              <analyzer type="query">
                <tokenizer class="solr.StandardTokenizerFactory"/>
                <filter class="solr.ICUFoldingFilterFactory"/>
              </analyzer>
            </fieldType>
            
            <fieldType name="xml_unicode" class="solr.TextField" positionIncrementGap="100">
              <analyzer type="index">
                <charFilter class="solr.HTMLStripCharFilterFactory"/>
                <tokenizer class="solr.StandardTokenizerFactory"/>
              </analyzer>
              <analyzer type="query">
                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
              </analyzer>
            </fieldType>
            
            <fieldType name="xml_unicode_full" class="solr.TextField" positionIncrementGap="100">
              <analyzer type="index">
                <charFilter class="solr.HTMLStripCharFilterFactory"/>
                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
              </analyzer>
              <analyzer type="query">
                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
              </analyzer>
            </fieldType>
        {code}

        And finally my search configuration:
        {code}
            <requestHandler name="dictionary" class="solr.SearchHandler">
                 <lst name="defaults">
                   <str name="echoParams">all</str>
                   <str name="defType">edismax</str>
                   <str name="mm">2&lt;-25%</str>
                   <str name="qf">dc_title_unicode_full^2 dc_title_unicode^2 dc_title</str>
                   <int name="rows">10</int>
                   <str name="spellcheck.onlyMorePopular">true</str>
                   <str name="spellcheck.extendedResults">false</str>
                   <str name="spellcheck.count">1</str>
                 </lst>
                <arr name="last-components">
                  <str>spellcheck</str>
                </arr>
            </requestHandler>
        {code}

        I am trying to match the field with various search phrases (that are valid). There are results:
        || # || search phrase || match? ||
        | 1 | cal•lígra? | (/) |
        | 2 | cal•ligra? | (x) |
        | 3 | cal•ligraf | (/) |
        | 4 | calligra? | (/) |

        The problem is the #2 attempt to match a data. The #3 works replacing ? with f.

        One more thing. If * is used insted of ? other data is matched as cal•lígrafia but not cal•lígraf...
        Dalius made changes -
        Field Original Value New Value
        Description Sorry for inaccurate title.
        I have a 3 fields containing same value:
        {code}
        <title xmlns="http://www.tei-c.org/ns/1.0">cal- lígraf</title>
        {code}
        and these fields are configured accordingly:
        {code}
            <fieldType name="xml" class="solr.TextField" positionIncrementGap="100">
              <analyzer type="index">
                <charFilter class="solr.HTMLStripCharFilterFactory"/>
                <tokenizer class="solr.StandardTokenizerFactory"/>
                <filter class="solr.ICUFoldingFilterFactory"/>
              </analyzer>
              <analyzer type="query">
                <tokenizer class="solr.StandardTokenizerFactory"/>
                <filter class="solr.ICUFoldingFilterFactory"/>
              </analyzer>
            </fieldType>
            
            <fieldType name="xml_unicode" class="solr.TextField" positionIncrementGap="100">
              <analyzer type="index">
                <charFilter class="solr.HTMLStripCharFilterFactory"/>
                <tokenizer class="solr.StandardTokenizerFactory"/>
              </analyzer>
              <analyzer type="query">
                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
              </analyzer>
            </fieldType>
            
            <fieldType name="xml_unicode_full" class="solr.TextField" positionIncrementGap="100">
              <analyzer type="index">
                <charFilter class="solr.HTMLStripCharFilterFactory"/>
                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
              </analyzer>
              <analyzer type="query">
                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
              </analyzer>
            </fieldType>
        {code}

        And finally my search configuration:
        {code}
            <requestHandler name="dictionary" class="solr.SearchHandler">
                 <lst name="defaults">
                   <str name="echoParams">all</str>
                   <str name="defType">edismax</str>
                   <str name="mm">2&lt;-25%</str>
                   <str name="qf">dc_title_unicode_full^2 dc_title_unicode^2 dc_title</str>
                   <int name="rows">10</int>
                   <str name="spellcheck.onlyMorePopular">true</str>
                   <str name="spellcheck.extendedResults">false</str>
                   <str name="spellcheck.count">1</str>
                 </lst>
                <arr name="last-components">
                  <str>spellcheck</str>
                </arr>
            </requestHandler>
        {code}

        I am trying to match the field with various search phrases (that are valid). There are results:
        || # || search phrase || match? ||
        | 1 | cal- lígra? | (/) |
        | 2 | cal- ligra? | (x) |
        | 3 | cal- ligraf | (/) |
        | 4 | calligra? | (/) |

        The problem is the #2 attempt to match a data. The #3 works replacing ? with f.

        One more thing. If * is used insted of ? other data is matched as cal- lígrafia but not cal- lígraf...
        Sorry for inaccurate title.
        I have a 3 fields containing same value:
        {code}
        <title xmlns="http://www.tei-c.org/ns/1.0">cal•lígraf</title>
        {code}
        and these fields are configured accordingly:
        {code}
            <fieldType name="xml" class="solr.TextField" positionIncrementGap="100">
              <analyzer type="index">
                <charFilter class="solr.HTMLStripCharFilterFactory"/>
                <tokenizer class="solr.StandardTokenizerFactory"/>
                <filter class="solr.ICUFoldingFilterFactory"/>
              </analyzer>
              <analyzer type="query">
                <tokenizer class="solr.StandardTokenizerFactory"/>
                <filter class="solr.ICUFoldingFilterFactory"/>
              </analyzer>
            </fieldType>
            
            <fieldType name="xml_unicode" class="solr.TextField" positionIncrementGap="100">
              <analyzer type="index">
                <charFilter class="solr.HTMLStripCharFilterFactory"/>
                <tokenizer class="solr.StandardTokenizerFactory"/>
              </analyzer>
              <analyzer type="query">
                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
              </analyzer>
            </fieldType>
            
            <fieldType name="xml_unicode_full" class="solr.TextField" positionIncrementGap="100">
              <analyzer type="index">
                <charFilter class="solr.HTMLStripCharFilterFactory"/>
                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
              </analyzer>
              <analyzer type="query">
                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
              </analyzer>
            </fieldType>
        {code}

        And finally my search configuration:
        {code}
            <requestHandler name="dictionary" class="solr.SearchHandler">
                 <lst name="defaults">
                   <str name="echoParams">all</str>
                   <str name="defType">edismax</str>
                   <str name="mm">2&lt;-25%</str>
                   <str name="qf">dc_title_unicode_full^2 dc_title_unicode^2 dc_title</str>
                   <int name="rows">10</int>
                   <str name="spellcheck.onlyMorePopular">true</str>
                   <str name="spellcheck.extendedResults">false</str>
                   <str name="spellcheck.count">1</str>
                 </lst>
                <arr name="last-components">
                  <str>spellcheck</str>
                </arr>
            </requestHandler>
        {code}

        I am trying to match the field with various search phrases (that are valid). There are results:
        || # || search phrase || match? ||
        | 1 | cal•lígra? | (/) |
        | 2 | cal•ligra? | (x) |
        | 3 | cal•ligraf | (/) |
        | 4 | calligra? | (/) |

        The problem is the #2 attempt to match a data. The #3 works replacing ? with f.

        One more thing. If * is used insted of ? other data is matched as cal•lígrafia but not cal•lígraf...
        Dalius created issue -

          People

          • Assignee:
            Unassigned
            Reporter:
            Dalius
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development