Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-9996

Can we improve DWPT's initial memory footprint?

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • None
    • 8.10
    • None
    • None
    • New

    Description

      Say you are indexing only keyword fields, that are both indexed and have doc values. The first document that gets added to a DWPT will increase memory usage by about 80kB per field. This is due mostly to:

      • the BytesRefHash for the inverted index, which allocates a 32kB page
      • the BytesRefHash for the doc values terms dict, which allocates another 32kB page
      • the SortedDocValuesWriter#pending buffer that allocates a long[1024]: 8kB

      So if you have 10 actively indexing indices that have 100 fields each and 24 indexing threads, this gives a total of 10*100*24*80kB = 1.8GB. If you happened to give less than 1.8GB for your indexing buffers overall, Lucene will likely do very small flushes that have only a few documents, which in-turn will make indexing rather slow.

      Could we improve DWPT so that it more progressively reserves memory as more documents get added?

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              jpountz Adrien Grand
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 40m
                  40m