Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-8542

Provide the LeafSlice to CollectorManager.newCollector to save memory on small index slices

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Patch Available
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: core/search
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      I have an index consisting of 44 million documents spread across 60 segments. When I run a query against this index with a huge number of results requested (e.g. 5 million), this query uses more than 5 GB of heap if the IndexSearch was configured to use an ExecutorService.

      (I know this kind of query is fairly unusual and it would be better to use paging and searchAfter, but our architecture does not allow this at the moment.)

      The reason for the huge memory requirement is that the search will create a TopScoreDocCollector for each segment, each one with numHits = 5 million. This is fine for the large segments, but many of those segments are fairly small and only contain several thousand documents. This wastes a huge amount of memory for queries with large values of numHits on indices with many segments.

      Therefore, I propose to change the CollectorManager - interface in the following way:

      • change the method newCollector to accept a parameter LeafSlice that can be used to determine the total count of documents in the LeafSlice
      • Maybe, in order to remain backwards compatible, it would be possible to introduce this as a new method with a default implementation that calls the old method - otherwise, it probably has to wait for Lucene 8?
      • This can then be used to cap numHits for each TopScoreDocCollector to the leafslice-size.

      If this is something that would make sense for you, I can try to provide a patch.

        Attachments

        1. LUCENE-8542.patch
          11 kB
          Christoph Kaser

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              christophk Christoph Kaser
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated: