Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-2939

Highlighter should try and use maxDocCharsToAnalyze in WeightedSpanTermExtractor when adding a new field to MemoryIndex as well as when using CachingTokenStream

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • None
    • 3.2, 4.0-ALPHA
    • modules/highlighter
    • None

    Description

      huge documents can be drastically slower than need be because the entire field is added to the memory index
      this cost can be greatly reduced in many cases if we try and respect maxDocCharsToAnalyze

      things can be improved even further by respecting this setting with CachingTokenStream

      Attachments

        1. LUCENE-2939.patch
          11 kB
          Mark Miller
        2. LUCENE-2939.patch
          10 kB
          Mark Miller
        3. LUCENE-2939.patch
          9 kB
          Mark Miller
        4. LUCENE-2939.patch
          7 kB
          Mark Miller

        Issue Links

          Activity

            People

              markrmiller@gmail.com Mark Miller
              markrmiller@gmail.com Mark Miller
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: