Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-9045

make RecoveryStrategy settings configurable

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 6.5, 7.0
    • Component/s: None
    • Labels:
      None

      Description

      objectives:

      • to allow users to change RecoveryStrategy settings such as maxRetries and startingRecoveryDelay
      • to support configuration of a custom recovery strategy e.g. SOLR-9044

      patch summary:

      • support for optional <recoveryStrategy> solrconfig.xml element added (if element is present then its class attribute is optional)
      • RecoveryStrategy settings now have getters/setters
      • RecoveryStrategy.Builder added (and RecoveryStrategy constructor made non-public in favour of RecoveryStrategy.Builder.create)
      • protected RecoveryStrategy.getReplicateLeaderUrl method factored out (ConfigureRecoveryStrategyTest$CustomRecoveryStrategyBuilder test illustrates how SOLR-9044 might override the method)
      • ConfigureRecoveryStrategyTest.java using solrconfig-configurerecoverystrategy.xml or solrconfig-customrecoverystrategy.xml

      illustrative solrconfig.xml snippets:

      • change a RecoveryStrategy setting
          <recoveryStrategy>
            <int name="maxRetries">250</int>
          </recoveryStrategy>
        
      • configure a custom class
          <recoveryStrategy class="org.apache.solr.core.ConfigureRecoveryStrategyTest$CustomRecoveryStrategyBuilder">
            <str name="alternativeBaseUrlProp">recovery_base_url</str>
          </recoveryStrategy>
        
      1. SOLR-9045.patch
        26 kB
        Christine Poerschke

        Issue Links

          Activity

          Hide
          cpoerschke Christine Poerschke added a comment -

          jira/solr-9045 working branch created. Comments and input on the idea itself and/or code reviews and/or collaboration, questions, etc. welcome as usual. Thank you.

          Show
          cpoerschke Christine Poerschke added a comment - jira/solr-9045 working branch created. Comments and input on the idea itself and/or code reviews and/or collaboration, questions, etc. welcome as usual. Thank you.
          Hide
          markrmiller@gmail.com Mark Miller added a comment -

          a custom recovery strategy

          Configuration makes sense to me, but I don't think we should ever officially support a custom user impl. Even unofficially, it seems dangerous to even let users think they can customize this. We could perhaps do something like UpdateHandler, where you can do it, but you're crazy to do it unless you check your impl every release and only have minor changes to behavior (and even then it's crazy).

          Show
          markrmiller@gmail.com Mark Miller added a comment - a custom recovery strategy Configuration makes sense to me, but I don't think we should ever officially support a custom user impl. Even unofficially, it seems dangerous to even let users think they can customize this. We could perhaps do something like UpdateHandler, where you can do it, but you're crazy to do it unless you check your impl every release and only have minor changes to behavior (and even then it's crazy).
          Hide
          dsmiley David Smiley added a comment -

          It'd be nice if this could be more clear as to what a "RecoveryStrategy" even is; maybe a longer name to clarify what it is that is being recovered? I don't know what it is.

          Show
          dsmiley David Smiley added a comment - It'd be nice if this could be more clear as to what a "RecoveryStrategy" even is; maybe a longer name to clarify what it is that is being recovered? I don't know what it is.
          Hide
          cpoerschke Christine Poerschke added a comment -

          Configuration makes sense to me, but I don't think we should ever officially support a custom user impl. ...

          Okay, that rules out a RecoveryStrategyFactory then, configuration of settings can be done without a factory.

          It'd be nice if this could be more clear as to what a "RecoveryStrategy" even is; ...

          I agree that the name(s) of any new configuration element(s) need not necessarily match the underlying class names.

          RecoveryStrategy currently has three settings which taken literally could translate into configuration something like this:

          <recoveryStrategy> <!-- class="solr.RecoveryStrategy" attribute conspicuous by its absence -->
            <int name="waitForUpdatesWithStaleStatePause">7000</int>
            <int name="maxRetries">500</int>
            <int name="startingRecoveryDelay">5000</int>
          </recoveryStrategy>
          

          How are the settings used?

          ... We could perhaps do something like UpdateHandler, where you can do it, but you're crazy to do it unless you check your impl every release and only have minor changes to behavior (and even then it's crazy).

          I agree that configuring something other than the defaults is probably a bit of a niche use case. Replication via an alternative network interface (SOLR-9044) e.g. to separate out replication versus regular live within-cloud traffic would perhaps also be relatively niche (though technically only a minor change to behaviour) and opt-in to that could be via an additional configuration rather than a class derived from the existing RecoveryStrategy class.

          So, it seems there's three alternatives.

          • solrconfig.xml element called <recoveryStrategy> similar to the <updateHandler> element and as illustrated above
          • solrconfig.xml element(s) called something else (similar to the <updateHandler> element?)
          • two additional system properties "solr.cloud.max-retries" and "solr.cloud.starting-recovery-delay" ("solr.cloud.wait-for-updates-with-stale-state-pause" already exists)

          What do you think?

          Show
          cpoerschke Christine Poerschke added a comment - Configuration makes sense to me, but I don't think we should ever officially support a custom user impl. ... Okay, that rules out a RecoveryStrategyFactory then, configuration of settings can be done without a factory. It'd be nice if this could be more clear as to what a "RecoveryStrategy" even is; ... I agree that the name(s) of any new configuration element(s) need not necessarily match the underlying class names. RecoveryStrategy currently has three settings which taken literally could translate into configuration something like this: <recoveryStrategy> <!-- class= "solr.RecoveryStrategy" attribute conspicuous by its absence --> < int name= "waitForUpdatesWithStaleStatePause" >7000</ int > < int name= "maxRetries" >500</ int > < int name= "startingRecoveryDelay" >5000</ int > </recoveryStrategy> How are the settings used? waitForUpdatesWithStaleStatePause 's code mentions SOLR-7141 for discussion around current value. startingRecoveryDelay is used to wait for an (exponential) interval between retries. maxRetries determines how soon "Recovery failed - I give up." happens. ... We could perhaps do something like UpdateHandler, where you can do it, but you're crazy to do it unless you check your impl every release and only have minor changes to behavior (and even then it's crazy). I agree that configuring something other than the defaults is probably a bit of a niche use case. Replication via an alternative network interface ( SOLR-9044 ) e.g. to separate out replication versus regular live within-cloud traffic would perhaps also be relatively niche (though technically only a minor change to behaviour) and opt-in to that could be via an additional configuration rather than a class derived from the existing RecoveryStrategy class. So, it seems there's three alternatives. solrconfig.xml element called <recoveryStrategy> similar to the <updateHandler> element and as illustrated above solrconfig.xml element(s) called something else (similar to the <updateHandler> element?) two additional system properties "solr.cloud.max-retries" and "solr.cloud.starting-recovery-delay" ("solr.cloud.wait-for-updates-with-stale-state-pause" already exists) What do you think?
          Hide
          markrmiller@gmail.com Mark Miller added a comment -

          I think doing it like UpdateHandler is probably fine, but that we should be explicit in comments / doc that custom impls are not really supported between versions in terms of API or behavior back compat.

          Show
          markrmiller@gmail.com Mark Miller added a comment - I think doing it like UpdateHandler is probably fine, but that we should be explicit in comments / doc that custom impls are not really supported between versions in terms of API or behavior back compat.
          Hide
          markrmiller@gmail.com Mark Miller added a comment -

          The name is really another issue. The history of the current name is that it was the first thing I thought of when prototyping. At the time, I probably imagined some other strategies. I no longer do. A rename should probably be done first if we are going to do it.

          Show
          markrmiller@gmail.com Mark Miller added a comment - The name is really another issue. The history of the current name is that it was the first thing I thought of when prototyping. At the time, I probably imagined some other strategies. I no longer do. A rename should probably be done first if we are going to do it.
          Hide
          cpoerschke Christine Poerschke added a comment -

          ... we should be explicit in comments / doc that custom impls are not really supported between versions ...

          ... A rename should probably be done first if we are going to do it.

          I agree that any rename should be done first and that comments/docs should be very clear re: custom implementations being custom and there being no guarantees w.r.t. backwards compatibility or lack thereof.

          How about renaming RecoveryStrategy to RecoveryImplementation (or something like it)? To indicate in name as well as comments/docs that the class is implementation and thus no assumptions should be made w.r.t. changes between versions.

          Show
          cpoerschke Christine Poerschke added a comment - ... we should be explicit in comments / doc that custom impls are not really supported between versions ... ... A rename should probably be done first if we are going to do it. I agree that any rename should be done first and that comments/docs should be very clear re: custom implementations being custom and there being no guarantees w.r.t. backwards compatibility or lack thereof. How about renaming RecoveryStrategy to RecoveryImplementation (or something like it)? To indicate in name as well as comments/docs that the class is implementation and thus no assumptions should be made w.r.t. changes between versions.
          Hide
          dsmiley David Smiley added a comment -

          How about renaming RecoveryStrategy to RecoveryImplementation (or something like it)

          Might it have what is being recovered? Or say something like SolrCloudRecoveryImpl to at least give a sense of scope that it has to do with SolrCloud stuff? Just a thought. But then it's in the "cloud" package so perhaps nevermind. My very first reaction was wondering if it was some sort of connection recovery thing.

          Show
          dsmiley David Smiley added a comment - How about renaming RecoveryStrategy to RecoveryImplementation (or something like it) Might it have what is being recovered? Or say something like SolrCloudRecoveryImpl to at least give a sense of scope that it has to do with SolrCloud stuff? Just a thought. But then it's in the "cloud" package so perhaps nevermind. My very first reaction was wondering if it was some sort of connection recovery thing.
          Hide
          markrmiller@gmail.com Mark Miller added a comment -

          How about renaming RecoveryStrategy to RecoveryImplementation (or something like it)?

          Trying to think of what this actually is, and currently I'm around DataRecoveryRunner or something.

          Show
          markrmiller@gmail.com Mark Miller added a comment - How about renaming RecoveryStrategy to RecoveryImplementation (or something like it)? Trying to think of what this actually is, and currently I'm around DataRecoveryRunner or something.
          Hide
          markrmiller@gmail.com Mark Miller added a comment -

          Could be DataSynchronizationRunner or something too, but given it's the prime driver of the recovery status, I kind of like recovery in the name. Of course, tlog replay should probably be under recovery too, and that is very distinct from this class, so perhaps it's better it doesn't try and synergize with the recovery key word, as it really just may be a piece of the recovery puzzle longer term.

          Show
          markrmiller@gmail.com Mark Miller added a comment - Could be DataSynchronizationRunner or something too, but given it's the prime driver of the recovery status, I kind of like recovery in the name. Of course, tlog replay should probably be under recovery too, and that is very distinct from this class, so perhaps it's better it doesn't try and synergize with the recovery key word, as it really just may be a piece of the recovery puzzle longer term.
          Hide
          markrmiller@gmail.com Mark Miller added a comment -

          What is the motivation for making this stuff configurable by the way? I really think we should make this mostly just work well in all situations rather than expose internal config that may change or should not really be necessary to tweak.

          Show
          markrmiller@gmail.com Mark Miller added a comment - What is the motivation for making this stuff configurable by the way? I really think we should make this mostly just work well in all situations rather than expose internal config that may change or should not really be necessary to tweak.
          Hide
          cpoerschke Christine Poerschke added a comment -

          Essentially SOLR-9044 is the motivation i.e. to support (somehow) configuration to use an alternative network interface (and the motivation for that being to separate copy-all-data recovery replication traffic from regular live within-cloud traffic).

          I would imagine SOLR-9044 would be done by optionally augmenting the cluster state with extra urls. The RecoveryStrategy configuration option would then be to opt-in to the use of the extra urls (i.e. the mere presence of the extra urls would not change RecoveryStrategy behaviour and if the recovery strategy opted-in to use the special urls then the absence of such urls should generate errors or warnings).

          Show
          cpoerschke Christine Poerschke added a comment - Essentially SOLR-9044 is the motivation i.e. to support (somehow) configuration to use an alternative network interface (and the motivation for that being to separate copy-all-data recovery replication traffic from regular live within-cloud traffic). I would imagine SOLR-9044 would be done by optionally augmenting the cluster state with extra urls. The RecoveryStrategy configuration option would then be to opt-in to the use of the extra urls (i.e. the mere presence of the extra urls would not change RecoveryStrategy behaviour and if the recovery strategy opted-in to use the special urls then the absence of such urls should generate errors or warnings).
          Hide
          cpoerschke Christine Poerschke added a comment -

          jira/solr-9045 working branch updated.

          Updated as yet without renaming of RecoveryStrategy but if renamed happened the rename would happen first.

          As far as making it really clear in the comments and docs that custom implementations are not really supported between versions, that is also yet to be done, I'm not quite sure where best to add it (assuming the optional <recoveryStrategy> solrconfig.xml element would not be added to the example configs).

          What do you think?

          Show
          cpoerschke Christine Poerschke added a comment - jira/solr-9045 working branch updated. Updated as yet without renaming of RecoveryStrategy but if renamed happened the rename would happen first. As far as making it really clear in the comments and docs that custom implementations are not really supported between versions, that is also yet to be done, I'm not quite sure where best to add it (assuming the optional <recoveryStrategy> solrconfig.xml element would not be added to the example configs). What do you think?
          Hide
          cpoerschke Christine Poerschke added a comment -

          Working branch revived and 'custom implementations not officially supported' style comments added; the motivation here remains unchanged i.e. to support customisation/configuration of alternative network interfaces so that copy-all-data replication traffic can be separated from regular live within-cloud traffic.

          https://github.com/apache/lucene-solr/compare/jira/solr-9045 is the updated working branch, additional comments, reviews, etc. welcome as usual. Hoping to commit the changes sometime next week.

          (Have refrained from renaming RecoveryStrategy to something else since it wasn't obvious what the new name should be.)

          Show
          cpoerschke Christine Poerschke added a comment - Working branch revived and 'custom implementations not officially supported' style comments added; the motivation here remains unchanged i.e. to support customisation/configuration of alternative network interfaces so that copy-all-data replication traffic can be separated from regular live within-cloud traffic. https://github.com/apache/lucene-solr/compare/jira/solr-9045 is the updated working branch, additional comments, reviews, etc. welcome as usual. Hoping to commit the changes sometime next week. (Have refrained from renaming RecoveryStrategy to something else since it wasn't obvious what the new name should be.)
          Hide
          cpoerschke Christine Poerschke added a comment -

          Merged latest master into working branch, attaching patch based on that i.e. diff(master, working-branch).

          Show
          cpoerschke Christine Poerschke added a comment - Merged latest master into working branch, attaching patch based on that i.e. diff(master, working-branch).
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit c8bad8c10ac52d89318932636b1e1401c314b5e4 in lucene-solr's branch refs/heads/master from Christine Poerschke
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=c8bad8c ]

          SOLR-9045: Make RecoveryStrategy settings configurable.

          Show
          jira-bot ASF subversion and git services added a comment - Commit c8bad8c10ac52d89318932636b1e1401c314b5e4 in lucene-solr's branch refs/heads/master from Christine Poerschke [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=c8bad8c ] SOLR-9045 : Make RecoveryStrategy settings configurable.
          Hide
          cpoerschke Christine Poerschke added a comment -

          Above commit is for master branch, intending to backport (tomorrow) to branch_6x for the upcoming 6.5 release.

          Show
          cpoerschke Christine Poerschke added a comment - Above commit is for master branch, intending to backport (tomorrow) to branch_6x for the upcoming 6.5 release.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit a7e9dbc565b15b1516a923eeef9517e2343f89c0 in lucene-solr's branch refs/heads/branch_6x from Christine Poerschke
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=a7e9dbc ]

          SOLR-9045: Make RecoveryStrategy settings configurable.

          Show
          jira-bot ASF subversion and git services added a comment - Commit a7e9dbc565b15b1516a923eeef9517e2343f89c0 in lucene-solr's branch refs/heads/branch_6x from Christine Poerschke [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=a7e9dbc ] SOLR-9045 : Make RecoveryStrategy settings configurable.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 5aafea84948e9838dd2cd333c3e873bea176c9e8 in lucene-solr's branch refs/heads/master from Christine Poerschke
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=5aafea8 ]

          SOLR-9045: exclude static methods from ConfigureRecoveryStrategyTest.testAlmostAllMethodsAreFinal

          Show
          jira-bot ASF subversion and git services added a comment - Commit 5aafea84948e9838dd2cd333c3e873bea176c9e8 in lucene-solr's branch refs/heads/master from Christine Poerschke [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=5aafea8 ] SOLR-9045 : exclude static methods from ConfigureRecoveryStrategyTest.testAlmostAllMethodsAreFinal
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 92bf7fcf459dd2c2408501f854c7c88c33b6894e in lucene-solr's branch refs/heads/branch_6x from Christine Poerschke
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=92bf7fc ]

          SOLR-9045: exclude static methods from ConfigureRecoveryStrategyTest.testAlmostAllMethodsAreFinal

          Show
          jira-bot ASF subversion and git services added a comment - Commit 92bf7fcf459dd2c2408501f854c7c88c33b6894e in lucene-solr's branch refs/heads/branch_6x from Christine Poerschke [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=92bf7fc ] SOLR-9045 : exclude static methods from ConfigureRecoveryStrategyTest.testAlmostAllMethodsAreFinal
          Hide
          tomasflobbe Tomás Fernández Löbbe added a comment -

          I'm a bit too late to this discussion (being released right now as part of 6.5), but as part of the work in SOLR-10233, I was planning to make RecoveryStrategy an interface, with the current implementation being the default, but also add a new ReplicateOnlyRecoveryStrategy that skips peer sync, and jumps directly to replication particularly for "Passive Replicas" (or whatever name they end up having). Just FYI, feel free to comment in SOLR-10233

          Show
          tomasflobbe Tomás Fernández Löbbe added a comment - I'm a bit too late to this discussion (being released right now as part of 6.5), but as part of the work in SOLR-10233 , I was planning to make RecoveryStrategy an interface, with the current implementation being the default, but also add a new ReplicateOnlyRecoveryStrategy that skips peer sync, and jumps directly to replication particularly for "Passive Replicas" (or whatever name they end up having). Just FYI, feel free to comment in SOLR-10233

            People

            • Assignee:
              cpoerschke Christine Poerschke
              Reporter:
              cpoerschke Christine Poerschke
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development