Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-494

Analyzer for preventing overload of search service by queries with common terms in large indexes

Details

    • New Feature
    • Status: Reopened
    • Minor
    • Resolution: Fixed
    • 2.4
    • 2.4
    • modules/analysis
    • None

    Description

      An analyzer used primarily at query time to wrap another analyzer and provide a layer of protection
      which prevents very common words from being passed into queries. For very large indexes the cost
      of reading TermDocs for a very common word can be high. This analyzer was created after experience with
      a 38 million doc index which had a term in around 50% of docs and was causing TermQueries for
      this term to take 2 seconds.

      Use the various "addStopWords" methods in this class to automate the identification and addition of
      stop words found in an already existing index.

      Attachments

        1. QueryAutoStopWordAnalyzer.java
          8 kB
          Mark Harwood
        2. QueryAutoStopWordAnalyzerTest.java
          6 kB
          Mark Harwood

        Activity

          People

            gsingers Grant Ingersoll
            mharwood Mark Harwood
            Votes:
            1 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: