Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-10627

Using ByteBuffersDataInput reduce memory copy on compressing data

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 9.4
    • core/codecs, core/store
    • None
    • New, Patch Available

    Description

      Code: https://github.com/apache/lucene/pull/987

      I see When Lucene Do flush and merge store fields, need many memory copies:

      Lucene Merge Thread #25940]" #906546 daemon prio=5 os_prio=0 cpu=20503.95ms elapsed=68.76s tid=0x00007ee990002c50 nid=0x3aac54 runnable  [0x00007f17718db000]
         java.lang.Thread.State: RUNNABLE
          at org.apache.lucene.store.ByteBuffersDataOutput.toArrayCopy(ByteBuffersDataOutput.java:271)
          at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.flush(CompressingStoredFieldsWriter.java:239)
          at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.finishDocument(CompressingStoredFieldsWriter.java:169)
          at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.merge(CompressingStoredFieldsWriter.java:654)
          at org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:228)
          at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:105)
          at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4760)
          at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4364)
          at org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:5923)
          at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:624)
          at org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:100)
          at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:682) 

      When Lucene CompressingStoredFieldsWriter do flush documents, it needs many memory copies:

      With Lucene90 using LZ4WithPresetDictCompressionMode:

      1. bufferedDocs.toArrayCopy copy blocks into one continue content for chunk compress
      2. compressor copy dict and data into one block buffer
      3. do compress
      4. copy compressed data out

      With Lucene90 using DeflateWithPresetDictCompressionMode:

      1. bufferedDocs.toArrayCopy copy blocks into one continue content for chunk compress
      2. do compress
      3. copy compressed data out

       

      I think we can use CompositeByteBuf to reduce temp memory copies:

      1. we do not have to bufferedDocs.toArrayCopy when just need continues content for chunk compress

       

      I write a simple mini benchamrk in test code (link ):
      LZ4WithPresetDict run Capacity:41943040(bytes) , iter 10times: Origin elapse:5391ms , New elapse:5297ms
      DeflateWithPresetDict run Capacity:41943040(bytes), iter 10times: Origin elapse:115ms, New elapse:12ms
       
      And I run runStoredFieldsBenchmark with doc_limit=-1:
      shows:

      Msec to index BEST_SPEED  BEST_COMPRESSION
      Baseline 318877.00 606288.00
      Candidate 314442.00 604719.00

       

      ---------{}UPDATE{}---------

       

       I try to reuse ByteBuffersDataInput to reduce memory copy because it can get from ByteBuffersDataOutput.toDataInput.  and it could reduce this complexity (PR

      BUT i am not sure whether can change Compressor interface compress input param from byte[] to ByteBuffersDataInput. If change this interface like, it increased the backport code like, however if we change the interface with ByteBuffersDataInput, we can optimize memory copy into different compress algorithm code.

      Also, i found we can do more memory copy reduce in CompressingStoredFieldsWriter.copyOneDoc like and CompressingTermVectorsWriter.flush (like)
       

      I think this commit just reduce memory copy, so we not only use one benchmark time metric but also use jvm gc time to see the improvement. so i try to add StatisticsHelper into StoredFieldsBenchmark.(code)

      so at latest commit:

      1. using ByteBuffersDataInput to reduce memory copy in CompressingStoredFieldsWriter doing flush
      2. using ByteBuffersDataInput to reduce memory copy in CompressingTermVectorsWriter doing flush
      3. using ByteBuffer to reduce memory copy in CompressingStoredFieldsWriter doing copyOneDoc
      4. replace compressor interface param from byte[] to ByteBuffersDataInput

       

      i do the runStoredFieldsBenchmark with jvm StatisticsHelper it shows as following:

      Msec to index BEST_SPEED  BEST_SPEED YGC  BEST_COMPRESSION BEST_COMPRESSION YGC
      Baseline 317973 1176 ms (258 collections) 605492 1476 ms (264 collections)
      Candidate 314765 1012 ms (238 collections) 601253 1175 ms (234 collections)

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              luyuncheng LuYunCheng
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 4.5h
                  4.5h