Lucene - Core
  1. Lucene - Core
  2. LUCENE-1965

Lazy Atomic Loading Stopwords in SmartCN

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Trivial Trivial
    • Resolution: Fixed
    • Affects Version/s: 2.9
    • Fix Version/s: 3.0
    • Component/s: modules/analysis
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      The default constructor in SmartChineseAnalyzer loads the default (jar embedded) stopwords each time the constructor is invoked.
      This should be atomically loaded only once in an unmodifiable set.

      1. LUCENE-1965.patch
        6 kB
        Simon Willnauer
      2. LUCENE-1965.patch
        6 kB
        Simon Willnauer

        Activity

        Hide
        Simon Willnauer added a comment -

        commited in r823285

        thx robert for reviewing

        Show
        Simon Willnauer added a comment - commited in r823285 thx robert for reviewing
        Hide
        Robert Muir added a comment -

        Simon, cool. I like it now, think its a good improvement, same as with Persian and Arabic, thanks

        Show
        Robert Muir added a comment - Simon, cool. I like it now, think its a good improvement, same as with Persian and Arabic, thanks
        Hide
        Simon Willnauer added a comment -

        Thanks robert, good catch! I was adding one test with null in the constructor but I missed to finish it apparently.
        I merged it into testChineseStopWordsOff().

        Patch attached.

        Show
        Simon Willnauer added a comment - Thanks robert, good catch! I was adding one test with null in the constructor but I missed to finish it apparently. I merged it into testChineseStopWordsOff(). Patch attached.
        Hide
        Robert Muir added a comment -

        Simon, everything is ok, but i have one comment:

        the new test: testChineseStopWordsNull, I think this is a duplicate of the one above. here is the context:

          /*
           * Punctuation is handled in a strange way if you disable stopwords
           * In this example the IDEOGRAPHIC FULL STOP is converted into a comma.
           * if you don't supply (true) to the constructor, or use a different stopwords list,
           * then punctuation is indexed.
           */
          public void testChineseStopWordsOff() throws Exception {  
            Analyzer ca = new SmartChineseAnalyzer(false); /* doesnt load stopwords */
            String sentence = "我购买了道具和服装。";
            String result[] = { "我", "购买", "了", "道具", "和", "服装", "," };
            assertAnalyzesTo(ca, sentence, result);
            
            
          }
          
          public void testChineseStopWordsNull() throws IOException{
            Analyzer ca = new SmartChineseAnalyzer(false); /* sets stopwords to empty set */
            String sentence = "我购买了道具和服装。";
            String result[] = { "我", "购买", "了", "道具", "和", "服装", "," };
            assertAnalyzesTo(ca, sentence, result);
            assertAnalyzesToReuse(ca, sentence, result);
          }
        
        Show
        Robert Muir added a comment - Simon, everything is ok, but i have one comment: the new test: testChineseStopWordsNull, I think this is a duplicate of the one above. here is the context: /* * Punctuation is handled in a strange way if you disable stopwords * In this example the IDEOGRAPHIC FULL STOP is converted into a comma. * if you don't supply ( true ) to the constructor, or use a different stopwords list, * then punctuation is indexed. */ public void testChineseStopWordsOff() throws Exception { Analyzer ca = new SmartChineseAnalyzer( false ); /* doesnt load stopwords */ String sentence = "我购买了道具和服装。" ; String result[] = { "我" , "购买" , "了" , "道具" , "和" , "服装" , "," }; assertAnalyzesTo(ca, sentence, result); } public void testChineseStopWordsNull() throws IOException{ Analyzer ca = new SmartChineseAnalyzer( false ); /* sets stopwords to empty set */ String sentence = "我购买了道具和服装。" ; String result[] = { "我" , "购买" , "了" , "道具" , "和" , "服装" , "," }; assertAnalyzesTo(ca, sentence, result); assertAnalyzesToReuse(ca, sentence, result); }
        Hide
        Simon Willnauer added a comment -

        attached patch

        Show
        Simon Willnauer added a comment - attached patch

          People

          • Assignee:
            Simon Willnauer
            Reporter:
            Simon Willnauer
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development