Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-9406

Make it simpler to track IndexWriter's events

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: main (9.0)
    • Component/s: core/index
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      This is the second spinoff from a controversial PR to add a new index-time feature to Lucene to merge small segments during commit.  That change can substantially reduce the number of small index segments to search.

      In that PR, there was a new proposed interface, IndexWriterEvents, giving the application a chance to track when IndexWriter kicked off merges during commit, how many, how long it waited, how often it gave up waiting, etc.

      Such telemetry from production usage is really helpful when tuning settings like which merges (e.g. a size threshold) to attempt on commit, and how long to wait during commit, etc.

      I am splitting out this issue to explore possible approaches to do this.  E.g. Simon Willnauer proposed using a statistics class instead, but if I understood that correctly, I think that would put the role of aggregation inside IndexWriter, which is not ideal.

      Many interesting events, e.g. how many merges are being requested, how large are they, how long did they take to complete or fail, etc., can be gleaned by wrapping expert Lucene classes like MergePolicy and MergeScheduler.  But for those events that cannot (e.g. IndexWriter stopped waiting for merges during commit), it would be very helpful to have some simple way to track so applications can better tune.

      It is also possible to subclass IndexWriter and override key methods, but I think that is inherently risky as IndexWriter's protected methods are not considered to be a stable API, and the synchronization used by IndexWriter is confusing.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                mikemccand Michael McCandless
              • Votes:
                0 Vote for this issue
                Watchers:
                7 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 2.5h
                  2.5h