Solr
  1. Solr
  2. SOLR-1850

KeepWordFilter can be slow at query time if wordlist is large

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.4
    • Fix Version/s: 3.1, 4.0-ALPHA
    • Component/s: Schema and Analysis
    • Labels:
      None

      Description

      In the case when "Set<String> words" is large, constructing a KeepWordFilter at query time is very costly because of the construction (copy) of the set, e.g.:

      this.words = new CharArraySet(words, ignoreCase);

      This call does an addAll on the set, and is done for each query, and is the same work.

      Suggestion: overload the constructor and expose the CharArraySet, e.g.:

      public KeepWordFilter(TokenStream in, CharArraySet words )

      { super(in); this.words = words; this.termAtt = (TermAttribute)addAttribute(TermAttribute.class); }

      This allows the ability to have CharArraySet to be constructed once staticly for the application instead at query time.

        Activity

        Hide
        Yonik Seeley added a comment -

        Thanks for catching this John, copying the whole set each time is bad enough, I'd be tempted to classify it as a bug.

        Show
        Yonik Seeley added a comment - Thanks for catching this John, copying the whole set each time is bad enough, I'd be tempted to classify it as a bug.
        Hide
        John Wang added a comment -

        Hi Yonk:

        No problem! Do you think overloading the constructor is the right thing to do here?

        -John

        Show
        John Wang added a comment - Hi Yonk: No problem! Do you think overloading the constructor is the right thing to do here? -John
        Hide
        Yonik Seeley added a comment -

        Yes, that's definitely the way to go.

        Show
        Yonik Seeley added a comment - Yes, that's definitely the way to go.
        Hide
        Yonik Seeley added a comment -

        Thanks John, I've committed this suggestion along with a testcase fix.

        Show
        Yonik Seeley added a comment - Thanks John, I've committed this suggestion along with a testcase fix.
        Hide
        Hoss Man added a comment -

        Correcting Fix Version based on CHANGES.txt, see this thread for more details...

        http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E

        Show
        Hoss Man added a comment - Correcting Fix Version based on CHANGES.txt, see this thread for more details... http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E
        Hide
        Grant Ingersoll added a comment -

        Bulk close for 3.1.0 release

        Show
        Grant Ingersoll added a comment - Bulk close for 3.1.0 release

          People

          • Assignee:
            Unassigned
            Reporter:
            John Wang
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development