Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-494

Analyzer for preventing overload of search service by queries with common terms in large indexes

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 2.4
    • Fix Version/s: 2.4
    • Component/s: modules/analysis
    • Labels:
      None

      Description

      An analyzer used primarily at query time to wrap another analyzer and provide a layer of protection
      which prevents very common words from being passed into queries. For very large indexes the cost
      of reading TermDocs for a very common word can be high. This analyzer was created after experience with
      a 38 million doc index which had a term in around 50% of docs and was causing TermQueries for
      this term to take 2 seconds.

      Use the various "addStopWords" methods in this class to automate the identification and addition of
      stop words found in an already existing index.

        Attachments

        1. QueryAutoStopWordAnalyzerTest.java
          6 kB
          Mark Harwood
        2. QueryAutoStopWordAnalyzer.java
          8 kB
          Mark Harwood

          Activity

            People

            • Assignee:
              gsingers Grant Ingersoll
              Reporter:
              markh Mark Harwood
            • Votes:
              1 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: