Uploaded image for project: 'Lucene.Net'
  1. Lucene.Net
  2. LUCENENET-414

The definition of CharArraySet is dangerously confusing and leads to bugs when used.

    XMLWordPrintableJSON

Details

    Description

      Right now, CharArraySet derives from System.Collections.Hashtable, but doesn't actually use this base type for storing elements.
      However, the StandardAnalyzer.STOP_WORDS_SET is exposed as a System.Collections.Hashtable. The trivial code to build your own stopword set using the StandardAnalyzer.STOP_WORDS_SET and adding your own set of stopwords like this:

      CharArraySet myStopWords = new CharArraySet(StandardAnalyzer.STOP_WORDS_SET, ignoreCase: false);
      foreach (string domainSpecificStopWord in DomainSpecificStopWords)
      stopWords.Add(domainSpecificStopWord);

      ... will fail because the CharArraySet accepts an ICollection, which will be passed the Hashtable instance of STOP_WORDS_SET: the resulting myStopWords will only contain the DomainSpecificStopWords, and not those from STOP_WORDS_SET.

      One workaround would be to replace the first line with this:

      CharArraySet stopWords = new CharArraySet(StandardAnalyzer.STOP_WORDS_SET.Count + DomainSpecificStopWords.Length, ignoreCase: false);
      foreach (string domainSpecificStopWord in (CharArraySet)StandardAnalyzer.STOP_WORDS_SET)
      stopWords.Add(domainSpecificStopWord);

      ... but this makes use of the implementation detail (the STOP_WORDS_SET is really an UnmodifiableCharArraySet which is itself a CharArraySet). It works because it forces the foreach() to use the correct CharArraySet.GetEnumerator(), which is defined as a "new" method (this has a bad code smell to it)

      At least 2 possibilities exist to solve this problem:

      • Make CharArraySet use the Hashtable instance and a custom comparator, instead of its own implementation.
      • Make CharArraySet use HashSet<char[]>, defined in .NET 4.0.

      Attachments

        Activity

          People

            Unassigned Unassigned
            vvdb Vincent Van Den Berghe
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: