Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-7391

MemoryIndexReader.fields() performance regression

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 6.0
    • 6.2
    • None
    • None
    • New, Patch Available

    Description

      While upgrading our codebase from Lucene 4 to Lucene 6 we found a significant performance regression - a 5x slowdown

      On profiling the code, the method MemoryIndexReader.fields() shows up as one of the hottest methods

      Looking at the method, it just creates a copy of the inner fields Map before passing it to MemoryFields. It does this so that it can filter out fields with numTokens <= 0.

      The simplest "fix" would be to just remove the copying of the map completely, and pass fields directly to MemoryFields. It's simple and removes any slowdown caused by this method. It does potentially change behaviour though, but none of the unit tests seem to test that behaviour so I wonder whether it's necessary (I looked at the original ticket LUCENE-7091 that introduced this code, I can't find much in way of an explanation). I'm going to attach a patch to this effect anyway and we can take things from there

      Attachments

        1. LUCENE-7391.patch
          1 kB
          Steve Mason
        2. LUCENE-7391-test.patch
          2 kB
          Steve Mason
        3. LUCENE-7391.patch
          3 kB
          Steve Mason

        Activity

          People

            dsmiley David Smiley
            spmason Steve Mason
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: