Lucene - Core
  1. Lucene - Core
  2. LUCENE-4911

Missing word "cela" in conf/lang/stopwords_fr.txt

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Trivial Trivial
    • Resolution: Fixed
    • Affects Version/s: 4.2
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      NB: Not sure this defect is assigned to the right component.

      In file example/solr/collection1/conf/lang/stopwords_fr.txt,
      there is the word "celà". Though incorrect in French (cf http://fr.wiktionary.org/wiki/cel%C3%A0), it's common, but we may also add the correct spelling (e.g. "cela", whitout accent) to that stopwords list.

      Another thing: I noticed that "celà" is the only word of the list followed by an unbreakable space. Is that wanted?

        Activity

        Hide
        Iksnalybok added a comment -

        Thanks

        Show
        Iksnalybok added a comment - Thanks
        Hide
        Adrien Grand added a comment -

        For your information, Martin Porter (himself!) added cela to the upstream stop list (http://lists.tartarus.org/mailman/private/snowball-discuss/2013-April/001466.html).

        Show
        Adrien Grand added a comment - For your information, Martin Porter (himself!) added cela to the upstream stop list ( http://lists.tartarus.org/mailman/private/snowball-discuss/2013-April/001466.html ).
        Hide
        Adrien Grand added a comment -

        Pierre, I just applied your patch to Lucene's stop list (http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/french_stop.txt?view=diff&r1=1465255&r2=1465256&pathrev=1465256). Thank you! This fix should be available in Lucene/Solr 4.3.

        I also sent an email to snowball-discuss to mention this improvement: http://lists.tartarus.org/mailman/private/snowball-discuss/2013-April/001462.html

        Show
        Adrien Grand added a comment - Pierre, I just applied your patch to Lucene's stop list ( http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/french_stop.txt?view=diff&r1=1465255&r2=1465256&pathrev=1465256 ). Thank you! This fix should be available in Lucene/Solr 4.3. I also sent an email to snowball-discuss to mention this improvement: http://lists.tartarus.org/mailman/private/snowball-discuss/2013-April/001462.html
        Hide
        Robert Muir added a comment -

        Thanks Pierre: Actually this file is synchronized from lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/french_stop.txt (via a ant task from solr/ 'ant sync-analyzers')

        I think we should patch this file so its in the default lucene stoplist, too.

        It might also be a good idea for us to send an email about this to the snowball list (snowball-discuss@lists.tartarus.org) as thats where this file came from, they might be interested in the improvement, too.

        Show
        Robert Muir added a comment - Thanks Pierre: Actually this file is synchronized from lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/french_stop.txt (via a ant task from solr/ 'ant sync-analyzers') I think we should patch this file so its in the default lucene stoplist, too. It might also be a good idea for us to send an email about this to the snowball list (snowball-discuss@lists.tartarus.org) as thats where this file came from, they might be interested in the improvement, too.
        Hide
        Iksnalybok added a comment -

        Patch added.

        Show
        Iksnalybok added a comment - Patch added.
        Hide
        Adrien Grand added a comment -

        Indeed, we should indeed add "cela". Can you create a patch? I don't think the unbreakable space has been added on purpose, it can be removed.

        Show
        Adrien Grand added a comment - Indeed, we should indeed add "cela". Can you create a patch? I don't think the unbreakable space has been added on purpose, it can be removed.

          People

          • Assignee:
            Adrien Grand
            Reporter:
            Iksnalybok
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 10m
              10m
              Remaining:
              Remaining Estimate - 10m
              10m
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development