Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-6761

Ability to ignore commit and optimize requests from clients when running in SolrCloud mode.

    Details

    • Type: New Feature
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.0, 6.0
    • Component/s: SolrCloud, SolrJ
    • Labels:
      None

      Description

      In most SolrCloud environments, it's advisable to only rely on auto-commits (soft and hard) configured in solrconfig.xml and not send explicit commit requests from client applications. In fact, I've seen cases where improperly coded client applications can send commit requests too frequently, which can lead to harming the cluster's health.

      As a system administrator, I'd like the ability to disallow commit requests from client applications. Ideally, I could configure the updateHandler to ignore the requests and return an HTTP response code of my choosing as I may not want to break existing client applications by returning an error. In other words, I may want to just return 200 vs. 405. The same goes for optimize requests.

      1. SOLR-6761.patch
        12 kB
        Timothy Potter
      2. SOLR-6761.patch
        5 kB
        Timothy Potter

        Activity

        Hide
        yseeley@gmail.com Yonik Seeley added a comment -

        How about even more general: a minimum commitWithin and the ability to downgrade an immediate commit or softCommit to a soft commitWithin.
        Perhaps a special value of -1 could mean disallow / "don't actually do it" .

        So minCommitWithin=5000
        would convert an incoming commit to commitWithin=5000
        and would convert commitWithin=10 to commitWithin=5000

        Show
        yseeley@gmail.com Yonik Seeley added a comment - How about even more general: a minimum commitWithin and the ability to downgrade an immediate commit or softCommit to a soft commitWithin. Perhaps a special value of -1 could mean disallow / "don't actually do it" . So minCommitWithin=5000 would convert an incoming commit to commitWithin=5000 and would convert commitWithin=10 to commitWithin=5000
        Hide
        andyetitmoves Ramkumar Aiyengar added a comment -

        I like the idea, with the minor exception that it sounds wrong to return 200 instead of a 4xx. The client is doing some effort to add the commit request and should know that it's not been respected. If it breaks them, so be it, they are doing something the system is not configured to do. They might actually even rely on the assumption that once the commit is done it's immediately available for search..

        Show
        andyetitmoves Ramkumar Aiyengar added a comment - I like the idea, with the minor exception that it sounds wrong to return 200 instead of a 4xx. The client is doing some effort to add the commit request and should know that it's not been respected. If it breaks them, so be it, they are doing something the system is not configured to do. They might actually even rely on the assumption that once the commit is done it's immediately available for search..
        Hide
        markrmiller@gmail.com Mark Miller added a comment -

        +1, good idea Tim. I think it can make sense to use commitWithin from the client exclusively with SolrCloud, but only when a knowledgeable/expert person/team owns the service. That is very often not the case due to a variety of reasons in my experience. Solr is often deployed in situations where an administrator needs to protect the service from a variety of users with varying expertise.

        I agree with Ram though - I think it makes more sense to make sure the client knows it cannot call commit and adjusts behavior. We just need a useful error message.

        Show
        markrmiller@gmail.com Mark Miller added a comment - +1, good idea Tim. I think it can make sense to use commitWithin from the client exclusively with SolrCloud, but only when a knowledgeable/expert person/team owns the service. That is very often not the case due to a variety of reasons in my experience. Solr is often deployed in situations where an administrator needs to protect the service from a variety of users with varying expertise. I agree with Ram though - I think it makes more sense to make sure the client knows it cannot call commit and adjusts behavior. We just need a useful error message.
        Hide
        markrmiller@gmail.com Mark Miller added a comment -

        I don't see why silent fail couldn't be a config option though. There probably are Solr administrators that would like to try and address this and not break all it's clients. It's fairly dangerous if any clients where counting on that behavior though. I think it should come with a big fat warning at least.

        Show
        markrmiller@gmail.com Mark Miller added a comment - I don't see why silent fail couldn't be a config option though. There probably are Solr administrators that would like to try and address this and not break all it's clients. It's fairly dangerous if any clients where counting on that behavior though. I think it should come with a big fat warning at least.
        Hide
        hossman Hoss Man added a comment -

        This would be fairly easy to implement as an UpdateProcessor, which would also give you an easy way to enable/configure it (I thought we already had an open issue for that, but i may just be thinking of of the issue about killing Optimize)

        Show
        hossman Hoss Man added a comment - This would be fairly easy to implement as an UpdateProcessor, which would also give you an easy way to enable/configure it (I thought we already had an open issue for that, but i may just be thinking of of the issue about killing Optimize)
        Hide
        thelabdude Timothy Potter added a comment -

        Here's a patch that shows the custom UpdateRequestProcessor approach suggested by Hoss Man. The only concern I have is that it needs to be wired into solrconfig.xml. Going with the idea that this feature is for a system administrator, it might make more sense to set this at a more global level, esp. if admins give the ability for other groups to upload their own custom configs and create their own collections. So I'm thinking maybe just a system property (or solr.xml level property) that can be set that affects the DistributedUpateRequestProcessor?

        Show
        thelabdude Timothy Potter added a comment - Here's a patch that shows the custom UpdateRequestProcessor approach suggested by Hoss Man . The only concern I have is that it needs to be wired into solrconfig.xml. Going with the idea that this feature is for a system administrator, it might make more sense to set this at a more global level, esp. if admins give the ability for other groups to upload their own custom configs and create their own collections. So I'm thinking maybe just a system property (or solr.xml level property) that can be set that affects the DistributedUpateRequestProcessor?
        Hide
        hossman Hoss Man added a comment -

        i don't see how it's really any differnet then anything else we expect "solr admins" to edit in their solrconfig.xml (like enableStreaming, whether the /update should have any defaults on it, etc...)

        but if you really want it to be sysprop driven, that can still be done via the enable property...

        <processor class="solrIgnoreCommitUpdateProcessorFactory" enable="${solr.ignore.explicit.commit:false}">
          ...
        </processor>
        
        Show
        hossman Hoss Man added a comment - i don't see how it's really any differnet then anything else we expect "solr admins" to edit in their solrconfig.xml (like enableStreaming, whether the /update should have any defaults on it, etc...) but if you really want it to be sysprop driven, that can still be done via the enable property... <processor class= "solrIgnoreCommitUpdateProcessorFactory" enable= "${solr.ignore.explicit.commit: false }" > ... </processor>
        Hide
        erickerickson Erick Erickson added a comment - - edited

        Does Noble's work with an API to alter solrconfig.xml apply at all here?

        Show
        erickerickson Erick Erickson added a comment - - edited Does Noble's work with an API to alter solrconfig.xml apply at all here?
        Hide
        thelabdude Timothy Potter added a comment -

        Moving forward with the updateRequestProcessor approach. Latest patch includes a unit test. To activate this request processor you'll need to add something like the following to your solrconfig.xml:

          <updateRequestProcessorChain name="ignore-commit-from-client" default="true">
            <processor class="solr.IgnoreCommitOptimizeUpdateProcessorFactory">
              <int name="statusCode">200</int>
            </processor>
            <processor class="solr.LogUpdateProcessorFactory" />
            <processor class="solr.DistributedUpdateProcessorFactory" />
            <processor class="solr.RunUpdateProcessorFactory" />
          </updateRequestProcessorChain>
        

        As shown in the example above, the processor will return 200 to the client but will ignore the commit / optimize request. Notice that you need to wire-in the implicit processors needed by SolrCloud as well since this custom chain is taking the place of the default chain.

        In the following example, the processor will raise an exception with a 403 code with a customized error message:

          <updateRequestProcessorChain name="ignore-commit-from-client" default="true">
            <processor class="solr.IgnoreCommitOptimizeUpdateProcessorFactory">
              <int name="statusCode">403</int>
              <str name="responseMessage">Thou shall not issue a commit!</str>
            </processor>
            <processor class="solr.LogUpdateProcessorFactory" />
            <processor class="solr.DistributedUpdateProcessorFactory" />
            <processor class="solr.RunUpdateProcessorFactory" />
          </updateRequestProcessorChain>
        

        Lastly, you can also configure it to just ignore optimize and let commits pass thru by doing:

          <updateRequestProcessorChain name="ignore-optimize-only-from-client-403">
            <processor class="solr.IgnoreCommitOptimizeUpdateProcessorFactory">
              <str name="responseMessage">Thou shall not issue an optimize, but commits are OK!</str>
              <bool name="ignoreOptimizeOnly">true</bool>
            </processor>
            <processor class="solr.RunUpdateProcessorFactory" />
          </updateRequestProcessorChain>
        

        One idea I had for making this easier to turn on globally would be to wire it into the implicit chain definition (in SolrCore). The patch doesn't do this yet, but in SolrCore, when the implicit chain is setup, we could enable this if the node is in SolrCloud mode and a system property (solr.ignoreCommitOptimizeFromClients=both|optimize) is set.

        Show
        thelabdude Timothy Potter added a comment - Moving forward with the updateRequestProcessor approach. Latest patch includes a unit test. To activate this request processor you'll need to add something like the following to your solrconfig.xml: <updateRequestProcessorChain name= "ignore-commit-from-client" default = " true " > <processor class= "solr.IgnoreCommitOptimizeUpdateProcessorFactory" > < int name= "statusCode" >200</ int > </processor> <processor class= "solr.LogUpdateProcessorFactory" /> <processor class= "solr.DistributedUpdateProcessorFactory" /> <processor class= "solr.RunUpdateProcessorFactory" /> </updateRequestProcessorChain> As shown in the example above, the processor will return 200 to the client but will ignore the commit / optimize request. Notice that you need to wire-in the implicit processors needed by SolrCloud as well since this custom chain is taking the place of the default chain. In the following example, the processor will raise an exception with a 403 code with a customized error message: <updateRequestProcessorChain name= "ignore-commit-from-client" default = " true " > <processor class= "solr.IgnoreCommitOptimizeUpdateProcessorFactory" > < int name= "statusCode" >403</ int > <str name= "responseMessage" >Thou shall not issue a commit!</str> </processor> <processor class= "solr.LogUpdateProcessorFactory" /> <processor class= "solr.DistributedUpdateProcessorFactory" /> <processor class= "solr.RunUpdateProcessorFactory" /> </updateRequestProcessorChain> Lastly, you can also configure it to just ignore optimize and let commits pass thru by doing: <updateRequestProcessorChain name= "ignore-optimize-only-from-client-403" > <processor class= "solr.IgnoreCommitOptimizeUpdateProcessorFactory" > <str name= "responseMessage" >Thou shall not issue an optimize, but commits are OK!</str> <bool name= "ignoreOptimizeOnly" > true </bool> </processor> <processor class= "solr.RunUpdateProcessorFactory" /> </updateRequestProcessorChain> One idea I had for making this easier to turn on globally would be to wire it into the implicit chain definition (in SolrCore). The patch doesn't do this yet, but in SolrCore, when the implicit chain is setup, we could enable this if the node is in SolrCloud mode and a system property (solr.ignoreCommitOptimizeFromClients=both|optimize) is set.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 1648775 from Timothy Potter in branch 'dev/trunk'
        [ https://svn.apache.org/r1648775 ]

        SOLR-6761: Ability to ignore commit and optimize requests from clients when running in SolrCloud mode.

        Show
        jira-bot ASF subversion and git services added a comment - Commit 1648775 from Timothy Potter in branch 'dev/trunk' [ https://svn.apache.org/r1648775 ] SOLR-6761 : Ability to ignore commit and optimize requests from clients when running in SolrCloud mode.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 1650097 from Timothy Potter in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1650097 ]

        SOLR-6761: Ability to ignore commit and optimize requests from clients when running in SolrCloud mode.

        Show
        jira-bot ASF subversion and git services added a comment - Commit 1650097 from Timothy Potter in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1650097 ] SOLR-6761 : Ability to ignore commit and optimize requests from clients when running in SolrCloud mode.
        Hide
        arafalov Alexandre Rafalovitch added a comment -

        Just to clarify, the implementation itself does not care whether this is Cloud mode or not. You are leaving that for the sysadmin to set with the enable property, right?

        So, one could wire it up in a standalone mode, if they wanted to. Nothing prevents them. If so, maybe the description (in Readme) should say that it allows rejecting commits/optimize and something like "primarily useful for SolrCloud mode".

        Show
        arafalov Alexandre Rafalovitch added a comment - Just to clarify, the implementation itself does not care whether this is Cloud mode or not. You are leaving that for the sysadmin to set with the enable property, right? So, one could wire it up in a standalone mode, if they wanted to. Nothing prevents them. If so, maybe the description (in Readme) should say that it allows rejecting commits/optimize and something like "primarily useful for SolrCloud mode".
        Hide
        anshumg Anshum Gupta added a comment -

        Bulk close after 5.0 release.

        Show
        anshumg Anshum Gupta added a comment - Bulk close after 5.0 release.
        Hide
        azitabh Azitabh added a comment -

        Using IgnoreCommitOptimizeUpdateProcessorFactory mandatorily blocks optimize. However there is no option to periodically schedule an optimize operation, like autoCommit and autoSoftCommit. When does solr does auto optimize? Is it OK to block the nightly optimize that I have as of now in the favor of blocking clients from sending commits?

        Show
        azitabh Azitabh added a comment - Using IgnoreCommitOptimizeUpdateProcessorFactory mandatorily blocks optimize. However there is no option to periodically schedule an optimize operation, like autoCommit and autoSoftCommit. When does solr does auto optimize? Is it OK to block the nightly optimize that I have as of now in the favor of blocking clients from sending commits?
        Hide
        azitabh Azitabh added a comment -

        Timothy Potter Please refer to the above comment.

        Show
        azitabh Azitabh added a comment - Timothy Potter Please refer to the above comment.
        Hide
        karney karney luo added a comment -

        谢谢您的来信,罗昆仲已经收到 

        Show
        karney karney luo added a comment - 谢谢您的来信,罗昆仲已经收到 
        Hide
        elyograg Shawn Heisey added a comment -

        When does solr does auto optimize?

        Optimize never happens automatically, and it is highly unlikely that it ever will. Although it does sometimes have value, in typical situations, the recommendation is to never use the optimize feature.

        On large indexes, the optimize (forceMerge) operation can take a VERY long time, and if any deleteByQuery calls are used, the optimize will cause the delete (and any subsequent indexing requests) to block until it is finished. If deleteByQuery is not used, then indexing requests are safe to do while the optimize is underway, as of version 4.0.

        Please create a new issue to add a configuration option that allows optimizes while blocking commits, or maybe options to only allow an optimize under certain conditions. I can understand why Timothy Potter wrote the feature like it's written. The basic point of the update processor is to take away the ability for Solr users to carry out expensive and unnecessary operations. Very few operations are as expensive as optimize.

        Overall it is better to change the behavior of your clients rather than ignore some of their requests. Educate the people that write that code about how your Solr install should be used.

        Show
        elyograg Shawn Heisey added a comment - When does solr does auto optimize? Optimize never happens automatically, and it is highly unlikely that it ever will. Although it does sometimes have value, in typical situations, the recommendation is to never use the optimize feature. On large indexes, the optimize (forceMerge) operation can take a VERY long time, and if any deleteByQuery calls are used, the optimize will cause the delete (and any subsequent indexing requests) to block until it is finished. If deleteByQuery is not used, then indexing requests are safe to do while the optimize is underway, as of version 4.0. Please create a new issue to add a configuration option that allows optimizes while blocking commits, or maybe options to only allow an optimize under certain conditions. I can understand why Timothy Potter wrote the feature like it's written. The basic point of the update processor is to take away the ability for Solr users to carry out expensive and unnecessary operations. Very few operations are as expensive as optimize. Overall it is better to change the behavior of your clients rather than ignore some of their requests. Educate the people that write that code about how your Solr install should be used.
        Hide
        karney karney luo added a comment -

        谢谢您的来信,罗昆仲已经收到 

        Show
        karney karney luo added a comment - 谢谢您的来信,罗昆仲已经收到 
        Hide
        elyograg Shawn Heisey added a comment -

        I see an email notification from Jira saying Yago Riveiro logged a comment expressing displeasure at the notion of not allowing optimizes. I don't see Yago's comment in the issue even when it is refreshed, so perhaps Jira is having difficulties. I will respond to that comment even though I do not see it here:

        None of what has been discussed is default Solr behavior, and it never will be. To get this behavior, the config must be adjusted to add an update processor (IgnoreCommitOptimizeUpdateProcessorFactory) to the update chain. It does precisely what its name suggests – ignores commit and optimize requests. What I've been talking about is configuration of that update processor. Currently the update processor always blocks optimizes when it is used. If it's not used, then these operations are permitted. There is a configuration option for the processor (ignoreOptimizeOnly) that allows manual commits ... but there isn't a way to block commits and allow optimizes.

        I view this update processor as a temporary solution, a way to keep Solr from self-destructing until indexing clients can be fixed so they don't send problematic requests. Others might view it as a permanent part of their Solr install.

        Show
        elyograg Shawn Heisey added a comment - I see an email notification from Jira saying Yago Riveiro logged a comment expressing displeasure at the notion of not allowing optimizes. I don't see Yago's comment in the issue even when it is refreshed, so perhaps Jira is having difficulties. I will respond to that comment even though I do not see it here: None of what has been discussed is default Solr behavior, and it never will be. To get this behavior, the config must be adjusted to add an update processor (IgnoreCommitOptimizeUpdateProcessorFactory) to the update chain. It does precisely what its name suggests – ignores commit and optimize requests. What I've been talking about is configuration of that update processor. Currently the update processor always blocks optimizes when it is used. If it's not used, then these operations are permitted. There is a configuration option for the processor (ignoreOptimizeOnly) that allows manual commits ... but there isn't a way to block commits and allow optimizes. I view this update processor as a temporary solution, a way to keep Solr from self-destructing until indexing clients can be fixed so they don't send problematic requests. Others might view it as a permanent part of their Solr install.
        Hide
        yriveiro Yago Riveiro added a comment -

        If it's configurable for me ok

        Show
        yriveiro Yago Riveiro added a comment - If it's configurable for me ok
        Hide
        azitabh Azitabh added a comment -

        Shawn Heisey : when you say this:

        "Optimize never happens automatically, and it is highly unlikely that it ever will. Although it does sometimes have value, in typical situations, the recommendation is to never use the optimize feature"

        In use cases where delete is used often, the size of index on disk will keep on growing unless optimize is called periodically, isn't it? Even if we ignore the growing disk footprint for a while, it should also result in increased number of segments(increased query time?) over a period. So why is it that using optimize is never recommended?

        As of now if I opt to restrict clients from using forced commits using IgnoreCommitOptimizeUpdateProcessorFactory, I lose the ability to do periodic(and rare) optimizes.

        Show
        azitabh Azitabh added a comment - Shawn Heisey : when you say this: "Optimize never happens automatically, and it is highly unlikely that it ever will. Although it does sometimes have value, in typical situations, the recommendation is to never use the optimize feature" In use cases where delete is used often, the size of index on disk will keep on growing unless optimize is called periodically, isn't it? Even if we ignore the growing disk footprint for a while, it should also result in increased number of segments(increased query time?) over a period. So why is it that using optimize is never recommended? As of now if I opt to restrict clients from using forced commits using IgnoreCommitOptimizeUpdateProcessorFactory, I lose the ability to do periodic(and rare) optimizes.
        Hide
        erickerickson Erick Erickson added a comment -

        Azitabh:

        bq: In use cases where delete is used often, the size of index on disk will keep on growing unless optimize is called periodically

        This is not true. As the index changes, segments are merged. During the merge process, the space used by deleted documents is recovered. In fact, an "optimize" is really forcing the merge of all the segments (under the covers it's been renamed in fact to "forceMerge".

        If all you're doing is deleting, segments files can hang around for a while even if they're empty, but as you start adding docs again, they're merged according to your merge policy and space is reclaimed. Generally you'll have 10-15% of your index occupied by deleted docs, although that number can vary.

        For an excellent video, see: http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html. The third animation (TieredMergePolicy) is the default.

        And optimizing to a single segment is a very expensive operation as it rewrites the entire index. Worse, the single segment will accumulate deleted documents and since TieredMergePolicy tries to merge "like sized segments", the docs you're deleting after that will accumulate to an even greater percentage of your index than you'd have otherwise.

        So generally the recommendation is to not optimize unless you have a static index. By that I mean say you index say once a day during off hours. Go ahead and optimize, you'll get some performance gain. But for an index that's changing all the time it's counter-productive.

        If you absolutely must purge the deleted data, use expungeDeletes. That'll merge segments that have deleted documents into multiple segments. Unless you have evidence that you have outrageous numbers of deleted docs, though, I wouldn't even to that. And this is easily monitored on the admin UI by looking at the deletedDocs number. Note again, though, that if you exclusively do deletes that number will spike for a while.

        The take away here is that data for deleted documents accumulates for a while, but Lucene will take care of reclaiming that space in the normal course of events. Your concern that they will accumulate forever and segments will only be added is inaccurate.

        Show
        erickerickson Erick Erickson added a comment - Azitabh: bq: In use cases where delete is used often, the size of index on disk will keep on growing unless optimize is called periodically This is not true. As the index changes, segments are merged. During the merge process, the space used by deleted documents is recovered. In fact, an "optimize" is really forcing the merge of all the segments (under the covers it's been renamed in fact to "forceMerge". If all you're doing is deleting, segments files can hang around for a while even if they're empty, but as you start adding docs again, they're merged according to your merge policy and space is reclaimed. Generally you'll have 10-15% of your index occupied by deleted docs, although that number can vary. For an excellent video, see: http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html . The third animation (TieredMergePolicy) is the default. And optimizing to a single segment is a very expensive operation as it rewrites the entire index. Worse, the single segment will accumulate deleted documents and since TieredMergePolicy tries to merge "like sized segments", the docs you're deleting after that will accumulate to an even greater percentage of your index than you'd have otherwise. So generally the recommendation is to not optimize unless you have a static index. By that I mean say you index say once a day during off hours. Go ahead and optimize, you'll get some performance gain. But for an index that's changing all the time it's counter-productive. If you absolutely must purge the deleted data, use expungeDeletes. That'll merge segments that have deleted documents into multiple segments. Unless you have evidence that you have outrageous numbers of deleted docs, though, I wouldn't even to that. And this is easily monitored on the admin UI by looking at the deletedDocs number. Note again, though, that if you exclusively do deletes that number will spike for a while. The take away here is that data for deleted documents accumulates for a while, but Lucene will take care of reclaiming that space in the normal course of events. Your concern that they will accumulate forever and segments will only be added is inaccurate.

          People

          • Assignee:
            thelabdude Timothy Potter
            Reporter:
            thelabdude Timothy Potter
          • Votes:
            1 Vote for this issue
            Watchers:
            12 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development