Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-9003

Should FilterDirectoryReader compute numDocs lazily?

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • None
    • 8.4
    • None
    • None
    • New

    Description

      FilterDirectoryReader extends BaseCompositeReader, which computes both maxDoc and numDocs eagerly in its constructor by summing up these values across all sub leaves.

      This is problematic for readers that hide additional documents. Computing numDocs on such leaf readers usually requires iterating over all live documents to count them. This makes creating a FilterDirectoryReader on top run in linear time, which has caused several performance bugs to us over time. This is especially frustrating given that numDocs is a rarely used index statistic.

      I think computing numDocs lazily would be less surprising?

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              jpountz Adrien Grand
              Votes:
              1 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 20m
                  1h 20m