Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-9712

Saner default for maxWarmingSearchers

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 6.4, 7.0
    • Component/s: search
    • Security Level: Public (Default Security Level. Issues are Public)
    • Labels:
      None

      Description

      As noted in SOLR-9710, the default for maxWarmingSearchers is Integer.MAX_VALUE which is just crazy. Let's have a saner default. Today we log a performance warning when the number of on deck searchers goes over 1. What if we had the default as 1 that expert users can increase if needed?

      1. SOLR-9712.patch
        22 kB
        Yonik Seeley
      2. SOLR-9712.patch
        5 kB
        Yonik Seeley
      3. SOLR-9712.patch
        4 kB
        Yonik Seeley

        Issue Links

          Activity

          Hide
          mkhludnev Mikhail Khludnev added a comment -

          What if we had the default as 1 that expert users can increase if needed?

          update log will leak SOLR-7115

          Show
          mkhludnev Mikhail Khludnev added a comment - What if we had the default as 1 that expert users can increase if needed? update log will leak SOLR-7115
          Hide
          erickerickson Erick Erickson added a comment -

          Also, I think there's some kind weird issue where opening a searcher or loading a core or something seems to open two searchers so a default of 1 would generate the log warning messages every time which is disconcerting to the users. I don't recall the exact Solr JIRA or whether it's been fixed so FYI, so that's something to check.

          Show
          erickerickson Erick Erickson added a comment - Also, I think there's some kind weird issue where opening a searcher or loading a core or something seems to open two searchers so a default of 1 would generate the log warning messages every time which is disconcerting to the users. I don't recall the exact Solr JIRA or whether it's been fixed so FYI, so that's something to check.
          Hide
          yseeley@gmail.com Yonik Seeley added a comment -

          What we really need is additional functionality
          A high maxWarmingSearchers is bad from a resource perspective, but good from an API perspective.
          Conversely, maxWarmingSearchers=1 can cause spurious failures, requiring all clients that care if something is committed (presumably that's why they called commit) to implement a retry loop.

          One needs a reliable "update this document and make it visible, and don't return until it is visible" in conjunction with "let's not warm more than one searcher at a time".

          Show
          yseeley@gmail.com Yonik Seeley added a comment - What we really need is additional functionality A high maxWarmingSearchers is bad from a resource perspective, but good from an API perspective. Conversely, maxWarmingSearchers=1 can cause spurious failures, requiring all clients that care if something is committed (presumably that's why they called commit) to implement a retry loop. One needs a reliable "update this document and make it visible, and don't return until it is visible" in conjunction with "let's not warm more than one searcher at a time".
          Hide
          dsmiley David Smiley added a comment -

          +1 Yonik. I've long felt that a commit should block instead of execute concurrently, possibly failing. It's kinda crazy this hasn't been done yet. It's a real gotcha for users who don't know better.

          Show
          dsmiley David Smiley added a comment - +1 Yonik. I've long felt that a commit should block instead of execute concurrently, possibly failing. It's kinda crazy this hasn't been done yet. It's a real gotcha for users who don't know better.
          Hide
          shalinmangar Shalin Shekhar Mangar added a comment -

          One needs a reliable "update this document and make it visible, and don't return until it is visible" in conjunction with "let's not warm more than one searcher at a time".

          I thought the first part was the reason behind waitSearcher=true? Agreed that we need to fix the second part. As David said, the commits should block by default instead of happening concurrently. Otherwise we cannot easily reason about memory consumption when multiple clients call commit.

          Show
          shalinmangar Shalin Shekhar Mangar added a comment - One needs a reliable "update this document and make it visible, and don't return until it is visible" in conjunction with "let's not warm more than one searcher at a time". I thought the first part was the reason behind waitSearcher=true? Agreed that we need to fix the second part. As David said, the commits should block by default instead of happening concurrently. Otherwise we cannot easily reason about memory consumption when multiple clients call commit.
          Hide
          yseeley@gmail.com Yonik Seeley added a comment -

          I thought the first part was the reason behind waitSearcher=true?

          Right, unless maxWarningSearchers is exceeded, in which case I think an exception is thrown?

          Show
          yseeley@gmail.com Yonik Seeley added a comment - I thought the first part was the reason behind waitSearcher=true? Right, unless maxWarningSearchers is exceeded, in which case I think an exception is thrown?
          Hide
          yseeley@gmail.com Yonik Seeley added a comment -

          Here's a completely untested patch.

          It's much smaller than it looks (change in indentation obscures actual changes.)

          This really just adds a loop around the open logic and then replaces the code that throws an exception when maxWarmingSearchers is exceeded with a wait. Seems like the lowest risk change since it doesn't change anything else.

                  } else if (onDeckSearchers > maxWarmingSearchers) {
                    onDeckSearchers--;
                    try {
                      searcherLock.wait();
                    } catch (InterruptedException e) {
                      log.info(SolrException.toStr(e));
                      // TODO: should we throw an exception in this case?
                    }
                    continue;  // go back to the top of the loop and retry
          

          That code just logs an InterruptedException, but perhaps we should throw an exception in that case (i.e. abort trying to open a reader)? Things like this sometimes do have a real impact on our Chaos tests though...

          Show
          yseeley@gmail.com Yonik Seeley added a comment - Here's a completely untested patch. It's much smaller than it looks (change in indentation obscures actual changes.) This really just adds a loop around the open logic and then replaces the code that throws an exception when maxWarmingSearchers is exceeded with a wait. Seems like the lowest risk change since it doesn't change anything else. } else if (onDeckSearchers > maxWarmingSearchers) { onDeckSearchers--; try { searcherLock.wait(); } catch (InterruptedException e) { log.info(SolrException.toStr(e)); // TODO: should we throw an exception in this case ? } continue ; // go back to the top of the loop and retry That code just logs an InterruptedException, but perhaps we should throw an exception in that case (i.e. abort trying to open a reader)? Things like this sometimes do have a real impact on our Chaos tests though...
          Hide
          yseeley@gmail.com Yonik Seeley added a comment -

          I often wondered if this was causing subtle bugs... other parts of the system rely on being able to reliably do a commit and not having it spuriously fail. Probably should have fixed it a long time ago!

          Show
          yseeley@gmail.com Yonik Seeley added a comment - I often wondered if this was causing subtle bugs... other parts of the system rely on being able to reliably do a commit and not having it spuriously fail. Probably should have fixed it a long time ago!
          Hide
          yseeley@gmail.com Yonik Seeley added a comment -

          If we go this route (which I think we should) then we probably also should:

          • change the default maxWarmingSearchers to 1
          • remove maxWarmingSearchers from our starting configs (it becomes pretty advanced... most shouldn't need to worry about it)
          • remove maxWarmingSearchers from the majority of our test configs - we'll want the best test coverage on maxWarmingSearchers=1
          • add an upgrade note encouraging removal of maxWarmingSearchers from your config, unless you specifically know better
          • document the change in behavior of course...
          Show
          yseeley@gmail.com Yonik Seeley added a comment - If we go this route (which I think we should) then we probably also should: change the default maxWarmingSearchers to 1 remove maxWarmingSearchers from our starting configs (it becomes pretty advanced... most shouldn't need to worry about it) remove maxWarmingSearchers from the majority of our test configs - we'll want the best test coverage on maxWarmingSearchers=1 add an upgrade note encouraging removal of maxWarmingSearchers from your config, unless you specifically know better document the change in behavior of course...
          Hide
          shalinmangar Shalin Shekhar Mangar added a comment -

          I like this. It is a simple fix and I believe it takes care of SOLR-7115 as well.

          Show
          shalinmangar Shalin Shekhar Mangar added a comment - I like this. It is a simple fix and I believe it takes care of SOLR-7115 as well.
          Hide
          markrmiller@gmail.com Mark Miller added a comment - - edited

          That code just logs an InterruptedException, but perhaps we should throw an exception in that case

          I almost would prefer not letting this get interrupted, it's a nasty path to test and I worry about everything going smoothly in all cases. On the other hand, if this code is interrupted, there is a good chance you are going to run into closed channel issues anyway.

          Show
          markrmiller@gmail.com Mark Miller added a comment - - edited That code just logs an InterruptedException, but perhaps we should throw an exception in that case I almost would prefer not letting this get interrupted, it's a nasty path to test and I worry about everything going smoothly in all cases. On the other hand, if this code is interrupted, there is a good chance you are going to run into closed channel issues anyway.
          Hide
          yseeley@gmail.com Yonik Seeley added a comment -

          OK, I'm going to create a stress test for concurrent commits...

          Show
          yseeley@gmail.com Yonik Seeley added a comment - OK, I'm going to create a stress test for concurrent commits...
          Hide
          yseeley@gmail.com Yonik Seeley added a comment -

          Here's an updated patch that changes the default maxWarmingSearchers to 1.
          It looks like I didn't need to create any additional tests... by changing maxWarmingSearchers to 1 first, a bunch of the stress tests started failing since they also test concurrent commits. TestStressVersions, TestStressReorder, and TestRealTimeGet all failed (not an exhausted list tested by hand) w/o the blocking patch and succeeded with it.

          Unfortunately, running the full test suite results in some errors. Not sure what's going on there yet.

          Show
          yseeley@gmail.com Yonik Seeley added a comment - Here's an updated patch that changes the default maxWarmingSearchers to 1. It looks like I didn't need to create any additional tests... by changing maxWarmingSearchers to 1 first, a bunch of the stress tests started failing since they also test concurrent commits. TestStressVersions, TestStressReorder, and TestRealTimeGet all failed (not an exhausted list tested by hand) w/o the blocking patch and succeeded with it. Unfortunately, running the full test suite results in some errors. Not sure what's going on there yet.
          Hide
          yseeley@gmail.com Yonik Seeley added a comment -

          OK, I have clean runs now (with no further changes)... looks like it was a bad checkout.

          Show
          yseeley@gmail.com Yonik Seeley added a comment - OK, I have clean runs now (with no further changes)... looks like it was a bad checkout.
          Hide
          yseeley@gmail.com Yonik Seeley added a comment -

          OK, here's the final patch that I'll probably commit tomorrow. Most maxWarmingSearcher specifications have been removed from test configurations. This doesn't really reduce coverage much due to the fact that maxWarmingSearchers=1 (the default) is not special-cased so all code paths will still be exercised with concurrent commits.

          Show
          yseeley@gmail.com Yonik Seeley added a comment - OK, here's the final patch that I'll probably commit tomorrow. Most maxWarmingSearcher specifications have been removed from test configurations. This doesn't really reduce coverage much due to the fact that maxWarmingSearchers=1 (the default) is not special-cased so all code paths will still be exercised with concurrent commits.
          Hide
          dsmiley David Smiley added a comment -

          This is wonderful and a big deal; thanks Yonik! I didn't notice a CHANGES.txt in the patch, so naturally you'll do that when you commit (I do the same approach). If the CHANGES.txt is no more clear than the title of this issue then users won't realize what this is all about, I think. Here's my attempt to word it:

          SOLR-9712: maxWarmingSearchers now defaults to 1, and more importantly commits will now *block* if this limit is reached instead of throwing an error (a good thing). Consequently there is no longer a risk in overlapping commits. Nonetheless users should continue to avoid excessive committing.

          Show
          dsmiley David Smiley added a comment - This is wonderful and a big deal; thanks Yonik! I didn't notice a CHANGES.txt in the patch, so naturally you'll do that when you commit (I do the same approach). If the CHANGES.txt is no more clear than the title of this issue then users won't realize what this is all about, I think. Here's my attempt to word it: SOLR-9712 : maxWarmingSearchers now defaults to 1, and more importantly commits will now *block* if this limit is reached instead of throwing an error (a good thing). Consequently there is no longer a risk in overlapping commits. Nonetheless users should continue to avoid excessive committing.
          Hide
          yseeley@gmail.com Yonik Seeley added a comment -

          If the CHANGES.txt is no more clear than the title of this issue then users won't realize what this is all about

          Heh... yep. I often ignore the original JIRA title for the commit message + CHANGES entry and try to pick something that means the most to developers for the former and users for the latter. This change should probably go under the "change of behavior" section.

          Show
          yseeley@gmail.com Yonik Seeley added a comment - If the CHANGES.txt is no more clear than the title of this issue then users won't realize what this is all about Heh... yep. I often ignore the original JIRA title for the commit message + CHANGES entry and try to pick something that means the most to developers for the former and users for the latter. This change should probably go under the "change of behavior" section.
          Hide
          yseeley@gmail.com Yonik Seeley added a comment -

          OK, how about this?

          maxWarmingSearchers now defaults to 1, and more importantly commits will now block if this limit is exceeded instead of throwing an exception (a good thing). Consequently there is no longer a risk in overlapping commits. Nonetheless users should continue to avoid excessive committing. Users are advised to remove any pre-existing maxWarmingSearchers entries from their solrconfig.xml files.

          Show
          yseeley@gmail.com Yonik Seeley added a comment - OK, how about this? maxWarmingSearchers now defaults to 1, and more importantly commits will now block if this limit is exceeded instead of throwing an exception (a good thing). Consequently there is no longer a risk in overlapping commits. Nonetheless users should continue to avoid excessive committing. Users are advised to remove any pre-existing maxWarmingSearchers entries from their solrconfig.xml files.
          Hide
          dsmiley David Smiley added a comment -

          +1 good advice

          Show
          dsmiley David Smiley added a comment - +1 good advice
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit c9522a393661c8878d488ad4475ac7e2cbb9c25c in lucene-solr's branch refs/heads/master from Yonik Seeley
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=c9522a3 ]

          SOLR-9712: block when maxWarmingSearchers is exceeded instead of throwing exception, default to 1, remove from most configs

          Show
          jira-bot ASF subversion and git services added a comment - Commit c9522a393661c8878d488ad4475ac7e2cbb9c25c in lucene-solr's branch refs/heads/master from Yonik Seeley [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=c9522a3 ] SOLR-9712 : block when maxWarmingSearchers is exceeded instead of throwing exception, default to 1, remove from most configs
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 0f4c5f0a732cb0df3a213d05dca8b7c477728154 in lucene-solr's branch refs/heads/branch_6x from Yonik Seeley
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=0f4c5f0 ]

          SOLR-9712: block when maxWarmingSearchers is exceeded instead of throwing exception, default to 1, remove from most configs

          Show
          jira-bot ASF subversion and git services added a comment - Commit 0f4c5f0a732cb0df3a213d05dca8b7c477728154 in lucene-solr's branch refs/heads/branch_6x from Yonik Seeley [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=0f4c5f0 ] SOLR-9712 : block when maxWarmingSearchers is exceeded instead of throwing exception, default to 1, remove from most configs
          Hide
          mkhludnev Mikhail Khludnev added a comment -

          @yonik have you tried the test from SOLR-7115 with it?

          Show
          mkhludnev Mikhail Khludnev added a comment - @yonik have you tried the test from SOLR-7115 with it?
          Hide
          yseeley@gmail.com Yonik Seeley added a comment -

          Nope... I hadn't realized we had a good reproducible test for that.
          Looks like that test expects to hit an exception though, so it would need to be tweaked to pass now.

          Show
          yseeley@gmail.com Yonik Seeley added a comment - Nope... I hadn't realized we had a good reproducible test for that. Looks like that test expects to hit an exception though, so it would need to be tweaked to pass now.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit c9522a393661c8878d488ad4475ac7e2cbb9c25c in lucene-solr's branch refs/heads/feature/metrics from Yonik Seeley
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=c9522a3 ]

          SOLR-9712: block when maxWarmingSearchers is exceeded instead of throwing exception, default to 1, remove from most configs

          Show
          jira-bot ASF subversion and git services added a comment - Commit c9522a393661c8878d488ad4475ac7e2cbb9c25c in lucene-solr's branch refs/heads/feature/metrics from Yonik Seeley [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=c9522a3 ] SOLR-9712 : block when maxWarmingSearchers is exceeded instead of throwing exception, default to 1, remove from most configs
          Hide
          markrmiller@gmail.com Mark Miller added a comment -

          Related issue here: SOLR-10110

          Show
          markrmiller@gmail.com Mark Miller added a comment - Related issue here: SOLR-10110

            People

            • Assignee:
              yseeley@gmail.com Yonik Seeley
              Reporter:
              shalinmangar Shalin Shekhar Mangar
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development