Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-7868

Use multiple threads to apply deletes and DV updates

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 7.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Today, when users delete documents or apply doc values updates, IndexWriter buffers them up into frozen packets and then eventually uses a single thread (BufferedUpdatesStream.applyDeletesAndUpdates) to resolve delete/update terms to docids. This thread also holds IW's monitor lock, so it also blocks refresh, merges starting/finishing, commits, etc.

      We have heavily optimized this part of Lucene over time, e.g. LUCENE-6161, LUCENE-2897, LUCENE-2680, LUCENE-3342, but still, it's a single thread so it can't use multiple CPU cores commonly available now.

      This doesn't affect append-only usage, but for update-heavy users (me!) this can be a big bottleneck, and causes long stop-the-world hangs during indexing.

      I have an initial exploratory patch to make these lookups concurrent, without holding IW's lock, so that when a new packet of deletes is pushed, which happens when we flush a new segment, we immediately use that same indexing thread to and resolve the deletions.

      This is analogous to when we made segment flushing concurrent (LUCENE-3023), just for deletes and DV updates as well.

      1. cpu-after.png
        94 kB
        Michael McCandless
      2. cpu-before.png
        85 kB
        Michael McCandless
      3. LUCENE-7868.patch
        336 kB
        Michael McCandless
      4. LUCENE-7868.patch
        336 kB
        Michael McCandless
      5. LUCENE-7868.patch
        333 kB
        Michael McCandless
      6. LUCENE-7868.patch
        295 kB
        Michael McCandless
      7. LUCENE-7868.patch
        279 kB
        Michael McCandless
      8. LUCENE-7868.patch
        190 kB
        Michael McCandless

        Activity

        Hide
        mikemccand Michael McCandless added a comment -

        Current WIP patch; core tests mostly pass but I still have plenty of nocommits.

        Show
        mikemccand Michael McCandless added a comment - Current WIP patch; core tests mostly pass but I still have plenty of nocommits.
        Hide
        mikemccand Michael McCandless added a comment -

        CPU usage charts.

        Show
        mikemccand Michael McCandless added a comment - CPU usage charts.
        Hide
        mikemccand Michael McCandless added a comment -

        I ran a quick indexing performance test on an internal corpus using an older version of the patch, comparing CPU usage before:

        to CPU usage with the patch:

        I don't have the exact numbers, and I need to re-run on the latest patch, but I think it was ~50% indexing throughput improvement overall. This is on 64-core box, 480 GB RAM (an i3.16xlarge EC2 instance).

        The before chart doesn't drop to 100 (one CPU) while applying deletes because there are concurrent merges running.

        (Those little spiky drops down to near 0 CPU usage are from GC; I was using the default parallel collector I think).

        Show
        mikemccand Michael McCandless added a comment - I ran a quick indexing performance test on an internal corpus using an older version of the patch, comparing CPU usage before: to CPU usage with the patch: I don't have the exact numbers, and I need to re-run on the latest patch, but I think it was ~50% indexing throughput improvement overall. This is on 64-core box, 480 GB RAM (an i3.16xlarge EC2 instance). The before chart doesn't drop to 100 (one CPU) while applying deletes because there are concurrent merges running. (Those little spiky drops down to near 0 CPU usage are from GC; I was using the default parallel collector I think).
        Hide
        mikemccand Michael McCandless added a comment -

        Another iteration; tests seem to be passing consistently now, but I still have 27 nocommits.

        I re-tested the speedup with this change, re-indexing nearly a billion documents into an index that already had all of those documents indexed once, and the overall speedup is ~53% faster with the patch (26.8 K docs/sec vs 17.5 K docs/sec).

        I still need to test doc-values updates performance.

        Besides the performance gains from concurrent deletes/updates, the patch has two user-visible changes:

        • Reader pooling is now enabled by default in IndexWriter; previously it was only turned on the first time you pulled an NRT reader. You can still disable this, but it will hurt indexing perf since all deletes/updates will be eagerly written through to the filesystem.
        • Removed IndexWriterConfig.set/getMaxBufferedDeleteTerms; this setting is no longer possible because Lucene eagerly resolves the deletes and updates.
        Show
        mikemccand Michael McCandless added a comment - Another iteration; tests seem to be passing consistently now, but I still have 27 nocommits. I re-tested the speedup with this change, re-indexing nearly a billion documents into an index that already had all of those documents indexed once, and the overall speedup is ~53% faster with the patch (26.8 K docs/sec vs 17.5 K docs/sec). I still need to test doc-values updates performance. Besides the performance gains from concurrent deletes/updates, the patch has two user-visible changes: Reader pooling is now enabled by default in IndexWriter; previously it was only turned on the first time you pulled an NRT reader. You can still disable this, but it will hurt indexing perf since all deletes/updates will be eagerly written through to the filesystem. Removed IndexWriterConfig.set/getMaxBufferedDeleteTerms; this setting is no longer possible because Lucene eagerly resolves the deletes and updates.
        Hide
        dsmiley David Smiley added a comment -

        Very exciting performance improvement Mike! How many concurrent threads are you using in your benchmarks?

        Show
        dsmiley David Smiley added a comment - Very exciting performance improvement Mike! How many concurrent threads are you using in your benchmarks?
        Hide
        mikemccand Michael McCandless added a comment -

        Very exciting performance improvement Mike!

        Thank you David Smiley.

        How many concurrent threads are you using in your benchmarks?

        I used 40 indexing threads, with 2 GB IW RAM buffer, on a 64 core box ... but I think this didn't buy much gain over 32 threads. I haven't tried to optimize for indexing thread count much.

        Show
        mikemccand Michael McCandless added a comment - Very exciting performance improvement Mike! Thank you David Smiley . How many concurrent threads are you using in your benchmarks? I used 40 indexing threads, with 2 GB IW RAM buffer, on a 64 core box ... but I think this didn't buy much gain over 32 threads. I haven't tried to optimize for indexing thread count much.
        Hide
        mikemccand Michael McCandless added a comment -

        Another iteration, with feedback from Robert Muir (thank you!) and still many nocommits.

        I added a new test case testing index sorting with DV updates and this uncovered a pre-existing bug, I think introduced with LUCENE-6766, when you try to update recently indexed documents ... I'll open a separate issue later to fix this for 6.6.x.

        I've also run some DV update performance tests and this uncovered problems with the patch ... still iterating.

        Show
        mikemccand Michael McCandless added a comment - Another iteration, with feedback from Robert Muir (thank you!) and still many nocommits. I added a new test case testing index sorting with DV updates and this uncovered a pre-existing bug, I think introduced with LUCENE-6766 , when you try to update recently indexed documents ... I'll open a separate issue later to fix this for 6.6.x. I've also run some DV update performance tests and this uncovered problems with the patch ... still iterating.
        Hide
        mikemccand Michael McCandless added a comment -

        Another iteration, I think it's ready!

        All nocommits are gone, all tests and "ant precommit" passes. I'll beast all tests some more before pushing.

        I improved how we compress the frozen packet of DV updates for better RAM efficiency: each frozen packet is ~8.3% of the original size of the un-frozen packet in my benchmark.

        I also tested DV updates performance, updating the price field in my internal corpus. With no refresh (just writing DV updates when RAM buffer is full) trunk updates at 8.0 K docs/sec, and the patch 58.0 K docs/sec (7.25X faster). With refresh every 60 seconds, trunk gets 7.4 K docs/sec and the patch gets 63.7 K docs/sec (8.6X faster). This is with 12 threads, 128 MB IW buffer.

        Show
        mikemccand Michael McCandless added a comment - Another iteration, I think it's ready! All nocommits are gone, all tests and "ant precommit" passes. I'll beast all tests some more before pushing. I improved how we compress the frozen packet of DV updates for better RAM efficiency: each frozen packet is ~8.3% of the original size of the un-frozen packet in my benchmark. I also tested DV updates performance, updating the price field in my internal corpus. With no refresh (just writing DV updates when RAM buffer is full) trunk updates at 8.0 K docs/sec, and the patch 58.0 K docs/sec (7.25X faster). With refresh every 60 seconds, trunk gets 7.4 K docs/sec and the patch gets 63.7 K docs/sec (8.6X faster). This is with 12 threads, 128 MB IW buffer.
        Hide
        mikemccand Michael McCandless added a comment -

        I posted the last patch in Review Board: https://reviews.apache.org/r/60154/

        Show
        mikemccand Michael McCandless added a comment - I posted the last patch in Review Board: https://reviews.apache.org/r/60154/
        Hide
        simonw Simon Willnauer added a comment -

        Michael McCandless i did a first pass at the patch... good stuff but I think we need to clean up some of the big ass loops

        Show
        simonw Simon Willnauer added a comment - Michael McCandless i did a first pass at the patch... good stuff but I think we need to clean up some of the big ass loops
        Hide
        mikemccand Michael McCandless added a comment -

        Thanks Simon Willnauer! I'll update the patch with your feedback.

        Show
        mikemccand Michael McCandless added a comment - Thanks Simon Willnauer ! I'll update the patch with your feedback.
        Hide
        mikemccand Michael McCandless added a comment -

        New patch, folding in Simon Willnauer's feedback. I also updated the diff on Review Board.

        Show
        mikemccand Michael McCandless added a comment - New patch, folding in Simon Willnauer 's feedback. I also updated the diff on Review Board.
        Hide
        simonw Simon Willnauer added a comment -

        Michael McCandless I did a second pass.. I think we are close

        Show
        simonw Simon Willnauer added a comment - Michael McCandless I did a second pass.. I think we are close
        Hide
        mikemccand Michael McCandless added a comment -

        Thank you Simon Willnauer, I'll look!

        Show
        mikemccand Michael McCandless added a comment - Thank you Simon Willnauer , I'll look!
        Hide
        mikemccand Michael McCandless added a comment -

        Another iteration folding Simon Willnauer's last feedback. Test, ant precommit -Dtests.nightly=true pass!

        Show
        mikemccand Michael McCandless added a comment - Another iteration folding Simon Willnauer 's last feedback. Test, ant precommit -Dtests.nightly=true pass!
        Hide
        simonw Simon Willnauer added a comment -

        Michael McCandless can you update reviewboard as well, it's way easier to review there.

        Show
        simonw Simon Willnauer added a comment - Michael McCandless can you update reviewboard as well, it's way easier to review there.
        Hide
        mikemccand Michael McCandless added a comment -

        Ugh, sorry, I though I had done that, but I must have forgot to click the "Publish Changes" button. Try now?

        I also realize I failed to click Publish Changes on my replies to your first review, sheesh!! So I clicked that now and I guess you now got an email with my comments from the first iteration!

        Show
        mikemccand Michael McCandless added a comment - Ugh, sorry, I though I had done that, but I must have forgot to click the "Publish Changes" button. Try now? I also realize I failed to click Publish Changes on my replies to your first review, sheesh!! So I clicked that now and I guess you now got an email with my comments from the first iteration!
        Hide
        simonw Simon Willnauer added a comment -

        LGTM thanks for all the iterations

        Show
        simonw Simon Willnauer added a comment - LGTM thanks for all the iterations
        Hide
        mikemccand Michael McCandless added a comment -

        Thanks Simon Willnauer; I'll run tests and push soon.

        Show
        mikemccand Michael McCandless added a comment - Thanks Simon Willnauer ; I'll run tests and push soon.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 58105a203a19d18a56e09cf69dc0083c1b890315 in lucene-solr's branch refs/heads/master from Mike McCandless
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=58105a2 ]

        LUCENE-7868: use multiple threads to concurrently resolve deletes and DV udpates

        Show
        jira-bot ASF subversion and git services added a comment - Commit 58105a203a19d18a56e09cf69dc0083c1b890315 in lucene-solr's branch refs/heads/master from Mike McCandless [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=58105a2 ] LUCENE-7868 : use multiple threads to concurrently resolve deletes and DV udpates
        Hide
        mikemccand Michael McCandless added a comment -

        I was unable to make writing of live docs files and new doc values files concurrent here: IW's concurrency is just too messy.

        Show
        mikemccand Michael McCandless added a comment - I was unable to make writing of live docs files and new doc values files concurrent here: IW's concurrency is just too messy.
        Hide
        steve_rowe Steve Rowe added a comment -

        For some reason these commit logs didn't get posted here:

        Repository: lucene-solr
        Updated Branches:
         refs/heads/master f0cc3769b -> 7c704d525
        
        
        LUCENE-7868: fix race condition when reader pooling is disabled
        
        
        Project: http://git-wip-us.apache.org/repos/asf/lucene-solr/repo
        Commit: http://git-wip-us.apache.org/repos/asf/lucene-solr/commit/7c704d52
        Tree: http://git-wip-us.apache.org/repos/asf/lucene-solr/tree/7c704d52
        Diff: http://git-wip-us.apache.org/repos/asf/lucene-solr/diff/7c704d52
        
        Branch: refs/heads/master
        Commit: 7c704d5258b3be8c383ccb96bf4a30be441f091c
        Parents: f0cc376
        Author: Mike McCandless <mikemccand@apache.org>
        Authored: Wed Jul 5 16:53:05 2017 -0400
        Committer: Mike McCandless <mikemccand@apache.org>
        Committed: Wed Jul 5 16:53:05 2017 -0400
        
        Repository: lucene-solr
        Updated Branches:
         refs/heads/branch_7x 454950aae -> 40dd3efb8
        
        
        LUCENE-7868: fix race condition when reader pooling is disabled
        
        
        Project: http://git-wip-us.apache.org/repos/asf/lucene-solr/repo
        Commit: http://git-wip-us.apache.org/repos/asf/lucene-solr/commit/40dd3efb
        Tree: http://git-wip-us.apache.org/repos/asf/lucene-solr/tree/40dd3efb
        Diff: http://git-wip-us.apache.org/repos/asf/lucene-solr/diff/40dd3efb
        
        Branch: refs/heads/branch_7x
        Commit: 40dd3efb8fb6b33a3e010e8c3d391d1165bd51e6
        Parents: 454950a
        Author: Mike McCandless <mikemccand@apache.org>
        Authored: Wed Jul 5 16:53:05 2017 -0400
        Committer: Mike McCandless <mikemccand@apache.org>
        Committed: Wed Jul 5 16:53:33 2017 -0400
        
        Repository: lucene-solr
        Updated Branches:
         refs/heads/branch_7_0 ec306dce2 -> 9ec400c4f
        
        
        LUCENE-7868: fix race condition when reader pooling is disabled
        
        
        Project: http://git-wip-us.apache.org/repos/asf/lucene-solr/repo
        Commit: http://git-wip-us.apache.org/repos/asf/lucene-solr/commit/9ec400c4
        Tree: http://git-wip-us.apache.org/repos/asf/lucene-solr/tree/9ec400c4
        Diff: http://git-wip-us.apache.org/repos/asf/lucene-solr/diff/9ec400c4
        
        Branch: refs/heads/branch_7_0
        Commit: 9ec400c4f69432773edd3678e21c4c08590cddf6
        Parents: ec306dc
        Author: Mike McCandless <mikemccand@apache.org>
        Authored: Wed Jul 5 16:53:05 2017 -0400
        Committer: Mike McCandless <mikemccand@apache.org>
        Committed: Wed Jul 5 16:54:12 2017 -0400
        
        Show
        steve_rowe Steve Rowe added a comment - For some reason these commit logs didn't get posted here: Repository: lucene-solr Updated Branches: refs/heads/master f0cc3769b -> 7c704d525 LUCENE-7868: fix race condition when reader pooling is disabled Project: http://git-wip-us.apache.org/repos/asf/lucene-solr/repo Commit: http://git-wip-us.apache.org/repos/asf/lucene-solr/commit/7c704d52 Tree: http://git-wip-us.apache.org/repos/asf/lucene-solr/tree/7c704d52 Diff: http://git-wip-us.apache.org/repos/asf/lucene-solr/diff/7c704d52 Branch: refs/heads/master Commit: 7c704d5258b3be8c383ccb96bf4a30be441f091c Parents: f0cc376 Author: Mike McCandless <mikemccand@apache.org> Authored: Wed Jul 5 16:53:05 2017 -0400 Committer: Mike McCandless <mikemccand@apache.org> Committed: Wed Jul 5 16:53:05 2017 -0400 Repository: lucene-solr Updated Branches: refs/heads/branch_7x 454950aae -> 40dd3efb8 LUCENE-7868: fix race condition when reader pooling is disabled Project: http://git-wip-us.apache.org/repos/asf/lucene-solr/repo Commit: http://git-wip-us.apache.org/repos/asf/lucene-solr/commit/40dd3efb Tree: http://git-wip-us.apache.org/repos/asf/lucene-solr/tree/40dd3efb Diff: http://git-wip-us.apache.org/repos/asf/lucene-solr/diff/40dd3efb Branch: refs/heads/branch_7x Commit: 40dd3efb8fb6b33a3e010e8c3d391d1165bd51e6 Parents: 454950a Author: Mike McCandless <mikemccand@apache.org> Authored: Wed Jul 5 16:53:05 2017 -0400 Committer: Mike McCandless <mikemccand@apache.org> Committed: Wed Jul 5 16:53:33 2017 -0400 Repository: lucene-solr Updated Branches: refs/heads/branch_7_0 ec306dce2 -> 9ec400c4f LUCENE-7868: fix race condition when reader pooling is disabled Project: http://git-wip-us.apache.org/repos/asf/lucene-solr/repo Commit: http://git-wip-us.apache.org/repos/asf/lucene-solr/commit/9ec400c4 Tree: http://git-wip-us.apache.org/repos/asf/lucene-solr/tree/9ec400c4 Diff: http://git-wip-us.apache.org/repos/asf/lucene-solr/diff/9ec400c4 Branch: refs/heads/branch_7_0 Commit: 9ec400c4f69432773edd3678e21c4c08590cddf6 Parents: ec306dc Author: Mike McCandless <mikemccand@apache.org> Authored: Wed Jul 5 16:53:05 2017 -0400 Committer: Mike McCandless <mikemccand@apache.org> Committed: Wed Jul 5 16:54:12 2017 -0400

          People

          • Assignee:
            mikemccand Michael McCandless
            Reporter:
            mikemccand Michael McCandless
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development