Details
-
Improvement
-
Status: Closed
-
Minor
-
Resolution: Fixed
-
None
-
None
-
None
-
New
Description
Say you are indexing only keyword fields, that are both indexed and have doc values. The first document that gets added to a DWPT will increase memory usage by about 80kB per field. This is due mostly to:
- the BytesRefHash for the inverted index, which allocates a 32kB page
- the BytesRefHash for the doc values terms dict, which allocates another 32kB page
- the SortedDocValuesWriter#pending buffer that allocates a long[1024]: 8kB
So if you have 10 actively indexing indices that have 100 fields each and 24 indexing threads, this gives a total of 10*100*24*80kB = 1.8GB. If you happened to give less than 1.8GB for your indexing buffers overall, Lucene will likely do very small flushes that have only a few documents, which in-turn will make indexing rather slow.
Could we improve DWPT so that it more progressively reserves memory as more documents get added?
Attachments
Issue Links
- links to