Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-2939

Highlighter should try and use maxDocCharsToAnalyze in WeightedSpanTermExtractor when adding a new field to MemoryIndex as well as when using CachingTokenStream

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.2, 4.0-ALPHA
    • Component/s: modules/highlighter
    • Labels:
      None

      Description

      huge documents can be drastically slower than need be because the entire field is added to the memory index
      this cost can be greatly reduced in many cases if we try and respect maxDocCharsToAnalyze

      things can be improved even further by respecting this setting with CachingTokenStream

        Attachments

        1. LUCENE-2939.patch
          11 kB
          Mark Miller
        2. LUCENE-2939.patch
          10 kB
          Mark Miller
        3. LUCENE-2939.patch
          9 kB
          Mark Miller
        4. LUCENE-2939.patch
          7 kB
          Mark Miller

          Issue Links

            Activity

              People

              • Assignee:
                markrmiller@gmail.com Mark Miller
                Reporter:
                markrmiller@gmail.com Mark Miller
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: