[LUCENE-10627] Using ByteBuffersDataInput reduce memory copy on compressing data - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 9.4
Component/s: core/codecs, core/store
Labels:
None

Lucene Fields:

New, Patch Available

Description

Code: https://github.com/apache/lucene/pull/987

I see When Lucene Do flush and merge store fields, need many memory copies:

Lucene Merge Thread #25940]" #906546 daemon prio=5 os_prio=0 cpu=20503.95ms elapsed=68.76s tid=0x00007ee990002c50 nid=0x3aac54 runnable  [0x00007f17718db000]
   java.lang.Thread.State: RUNNABLE
    at org.apache.lucene.store.ByteBuffersDataOutput.toArrayCopy(ByteBuffersDataOutput.java:271)
    at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.flush(CompressingStoredFieldsWriter.java:239)
    at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.finishDocument(CompressingStoredFieldsWriter.java:169)
    at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.merge(CompressingStoredFieldsWriter.java:654)
    at org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:228)
    at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:105)
    at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4760)
    at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4364)
    at org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:5923)
    at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:624)
    at org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:100)
    at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:682)

When Lucene CompressingStoredFieldsWriter do flush documents, it needs many memory copies:

With Lucene90 using LZ4WithPresetDictCompressionMode:

bufferedDocs.toArrayCopy copy blocks into one continue content for chunk compress
compressor copy dict and data into one block buffer
do compress
copy compressed data out

With Lucene90 using DeflateWithPresetDictCompressionMode:

bufferedDocs.toArrayCopy copy blocks into one continue content for chunk compress
do compress
copy compressed data out

I think we can use ~~CompositeByteBuf~~ to reduce temp memory copies:

we do not have to bufferedDocs.toArrayCopy when just need continues content for chunk compress

I write a simple mini benchamrk in test code (link ):
LZ4WithPresetDict run Capacity:41943040(bytes) , iter 10times: Origin elapse:5391ms , New elapse:5297ms
DeflateWithPresetDict run Capacity:41943040(bytes), iter 10times: Origin elapse:115ms, New elapse:12ms

And I run runStoredFieldsBenchmark with doc_limit=-1:
shows:

Msec to index	BEST_SPEED	BEST_COMPRESSION
Baseline	318877.00	606288.00
Candidate	314442.00	604719.00

---------{}UPDATE{}---------

I try to reuse ByteBuffersDataInput to reduce memory copy because it can get from ByteBuffersDataOutput.toDataInput. and it could reduce this complexity （PR）

BUT i am not sure whether can change Compressor interface compress input param from byte[] to ByteBuffersDataInput. If change this interface like, it increased the backport code like, however if we change the interface with ByteBuffersDataInput, we can optimize memory copy into different compress algorithm code.

Also, i found we can do more memory copy reduce in CompressingStoredFieldsWriter.copyOneDoc like and CompressingTermVectorsWriter.flush (like)

I think this commit just reduce memory copy, so we not only use one benchmark time metric but also use jvm gc time to see the improvement. so i try to add StatisticsHelper into StoredFieldsBenchmark.(code)

so at latest commit:

using ByteBuffersDataInput to reduce memory copy in CompressingStoredFieldsWriter doing flush
using ByteBuffersDataInput to reduce memory copy in CompressingTermVectorsWriter doing flush
using ByteBuffer to reduce memory copy in CompressingStoredFieldsWriter doing copyOneDoc
replace compressor interface param from byte[] to ByteBuffersDataInput

i do the runStoredFieldsBenchmark with jvm StatisticsHelper it shows as following:

Msec to index	BEST_SPEED	BEST_SPEED YGC	BEST_COMPRESSION	BEST_COMPRESSION YGC
Baseline	317973	1176 ms (258 collections)	605492	1476 ms (264 collections)
Candidate	314765	1012 ms (238 collections)	601253	1175 ms (234 collections)

Attachments

Issue Links

links to

GitHub Pull Request #987

Activity

People

Assignee:: Unassigned

Reporter:: LuYunCheng

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 27/Jun/22 13:08

Updated:: 28/Aug/22 16:42

Resolved:: 01/Aug/22 17:40

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

4.5h