Solr
  1. Solr
  2. SOLR-6324

Set finite default timeouts for select and update

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.0, 6.0
    • Component/s: search, update
    • Labels:
      None

      Description

      Currently HttpShardHandlerFactory and UpdateShardHandler default to infinite timeouts for socket connection and read. This can lead to undesirable behaviour, for example, if a machine crashes, then searches in progress will wait forever for a result to come back and end up using threads which will only get terminated at shutdown.

      We should have some finite default, however conservative it might be. These parameters are already configurable, so for expert uses, they can be increased if necessary anyway.

      Will attach a patch to set connection timeout to 60s and read timeout to 600s, but I am not too concerned about the actual value as long as there is one.

        Issue Links

          Activity

          Hide
          ASF GitHub Bot added a comment -

          GitHub user andyetitmoves opened a pull request:

          https://github.com/apache/lucene-solr/pull/79

          Set default connTimeout to 60s, soTimeout to 600s

          Patch for SOLR-6324.

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/bloomberg/lucene-solr trunk-finite-timeouts

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/lucene-solr/pull/79.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #79


          commit 3c8237f216d5e07edb2ab5a10d0d79f93f499605
          Author: Ramkumar Aiyengar <andyetitmoves@gmail.com>
          Date: 2014-08-05T18:17:41Z

          Set default connTimeout to 60s, soTimeout to 600s


          Show
          ASF GitHub Bot added a comment - GitHub user andyetitmoves opened a pull request: https://github.com/apache/lucene-solr/pull/79 Set default connTimeout to 60s, soTimeout to 600s Patch for SOLR-6324 . You can merge this pull request into a Git repository by running: $ git pull https://github.com/bloomberg/lucene-solr trunk-finite-timeouts Alternatively you can review and apply these changes as the patch at: https://github.com/apache/lucene-solr/pull/79.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #79 commit 3c8237f216d5e07edb2ab5a10d0d79f93f499605 Author: Ramkumar Aiyengar <andyetitmoves@gmail.com> Date: 2014-08-05T18:17:41Z Set default connTimeout to 60s, soTimeout to 600s
          Hide
          Mark Miller added a comment -

          Agreed - we need timeouts of some fine length.

          We may want to be even more conservative on the read timeout though - I'm on the fence. Really, we have to consider what we want for something like an optimize request that takes 3 hours to return. Do we care that the call will timeout? Probably something like that should be kicked off and the status pulled, but we should still think about any standard long calls.

          Show
          Mark Miller added a comment - Agreed - we need timeouts of some fine length. We may want to be even more conservative on the read timeout though - I'm on the fence. Really, we have to consider what we want for something like an optimize request that takes 3 hours to return. Do we care that the call will timeout? Probably something like that should be kicked off and the status pulled, but we should still think about any standard long calls.
          Hide
          Ramkumar Aiyengar added a comment -

          People shouldn't use optimize anyway, I wouldn't worry so much about breaking them and them having to change defaults if they really want to do it.

          My preference is for keeping this number as low as sensible, I would rather have them fail early than wait for the first time a machine crashes and they see through the consequences of a long read timeout – not many people test for that. But that said, if there are other things which genuinely take more than this time, we can certainly consider increasing this (and contemplate if some kind of an asynchronous API or optimization suits better than making a client wait at one time for more than 10 mins!)

          Show
          Ramkumar Aiyengar added a comment - People shouldn't use optimize anyway, I wouldn't worry so much about breaking them and them having to change defaults if they really want to do it. My preference is for keeping this number as low as sensible, I would rather have them fail early than wait for the first time a machine crashes and they see through the consequences of a long read timeout – not many people test for that. But that said, if there are other things which genuinely take more than this time, we can certainly consider increasing this (and contemplate if some kind of an asynchronous API or optimization suits better than making a client wait at one time for more than 10 mins!)
          Hide
          Mark Miller added a comment -

          People shouldn't use optimize anyway

          That's not really true. There are still valid reasons to use optimize. There was just a backlash against it because it was over used and under understood.

          Show
          Mark Miller added a comment - People shouldn't use optimize anyway That's not really true. There are still valid reasons to use optimize. There was just a backlash against it because it was over used and under understood.
          Hide
          Ramkumar Aiyengar added a comment -

          Sorry, wasn't aware of that, thought it was always discouraged. As far as this discussion goes, sure, if that's a common enough thing to require a larger timeout, that's fine, it's still better than no timeout really – that's the bit that really troubles me. Though if we really were to support optimize as a supported mechanism, and it can take 3 hours, we should really be making it a proper async call so that you don't expect a client to hold on to a connection for 3 hours to get a response. I understand that would require a bit more effort though..

          Show
          Ramkumar Aiyengar added a comment - Sorry, wasn't aware of that, thought it was always discouraged. As far as this discussion goes, sure, if that's a common enough thing to require a larger timeout, that's fine, it's still better than no timeout really – that's the bit that really troubles me. Though if we really were to support optimize as a supported mechanism, and it can take 3 hours, we should really be making it a proper async call so that you don't expect a client to hold on to a connection for 3 hours to get a response. I understand that would require a bit more effort though..
          Hide
          Shalin Shekhar Mangar added a comment -

          This patch doesn't set any timeouts for updates? I've found that many people don't know about the update shard handler timeouts because it's not mentioned in the example solr.xml by default in contrast to the search timeouts.

          Large timeouts are equally useless as no timeouts for heavy users but at least by having both update/search timeouts in solr.xml, people might pay more attention and actually set some sane values.

          +1 to the change. I swear I had opened an issue for the same thing a month back but I can't find it now.

          Show
          Shalin Shekhar Mangar added a comment - This patch doesn't set any timeouts for updates? I've found that many people don't know about the update shard handler timeouts because it's not mentioned in the example solr.xml by default in contrast to the search timeouts. Large timeouts are equally useless as no timeouts for heavy users but at least by having both update/search timeouts in solr.xml, people might pay more attention and actually set some sane values. +1 to the change. I swear I had opened an issue for the same thing a month back but I can't find it now.
          Hide
          Mark Miller added a comment -

          Im still not happy about breaking optimize - it's a terrible user experience. We need to change it to just kick off and not wait or something more sensible than just breaking its current behavior for many, many cases by default.

          Show
          Mark Miller added a comment - Im still not happy about breaking optimize - it's a terrible user experience. We need to change it to just kick off and not wait or something more sensible than just breaking its current behavior for many, many cases by default.
          Hide
          Mark Miller added a comment -

          if that's a common enough thing to require a larger timeout

          Any feature we support should not have a call that gives a timeout as a fairly normal course of action. It needs to be mitigated in some way.

          Show
          Mark Miller added a comment - if that's a common enough thing to require a larger timeout Any feature we support should not have a call that gives a timeout as a fairly normal course of action. It needs to be mitigated in some way.
          Hide
          Ramkumar Aiyengar added a comment -

          This patch doesn't set any timeouts for updates?

          That's the change in ConfigSolr.java in code. If you are referring to the xml change, I can probably add that as well.. I agree it's currently a bad experience though that select and update timeouts are in different places..

          Show
          Ramkumar Aiyengar added a comment - This patch doesn't set any timeouts for updates? That's the change in ConfigSolr.java in code. If you are referring to the xml change, I can probably add that as well.. I agree it's currently a bad experience though that select and update timeouts are in different places..
          Hide
          Ramkumar Aiyengar added a comment -

          I rebased the change to trunk and resolved conflicts, as well as added update timeouts to the solr.xml as a guide for anyone changing them..

          Show
          Ramkumar Aiyengar added a comment - I rebased the change to trunk and resolved conflicts, as well as added update timeouts to the solr.xml as a guide for anyone changing them..
          Hide
          Shalin Shekhar Mangar added a comment -

          If you are referring to the xml change, I can probably add that as well

          Yes, that's what I meant. Thanks for adding those.

          Show
          Shalin Shekhar Mangar added a comment - If you are referring to the xml change, I can probably add that as well Yes, that's what I meant. Thanks for adding those.
          Hide
          Mark Miller added a comment -

          Well I'm unhappy about optimize still. It becomes a pretty ugly call. However, it's not very high on my list at the moment, and it's looking like SOLR-4509 will effectively enforce client side idle timeouts anyway.

          Show
          Mark Miller added a comment - Well I'm unhappy about optimize still. It becomes a pretty ugly call. However, it's not very high on my list at the moment, and it's looking like SOLR-4509 will effectively enforce client side idle timeouts anyway.
          Hide
          Ramkumar Aiyengar added a comment -

          I did look at seeing what I can do about optimize, but couldn't come up with any approach short of rearchitecting optimize to be async which I didn't have time for. The other possibility here was that we apply a different timeout on a per request basis (which btw is generally useful and something i wished for), but that ran to a dead end as well – I don't recall why now..

          The stale check removal does force an idle timeout, but the connection timeouts still need to be made finite?

          Show
          Ramkumar Aiyengar added a comment - I did look at seeing what I can do about optimize, but couldn't come up with any approach short of rearchitecting optimize to be async which I didn't have time for. The other possibility here was that we apply a different timeout on a per request basis (which btw is generally useful and something i wished for), but that ran to a dead end as well – I don't recall why now.. The stale check removal does force an idle timeout, but the connection timeouts still need to be made finite?
          Hide
          ASF subversion and git services added a comment -

          Commit 1650768 from Mark Miller in branch 'dev/trunk'
          [ https://svn.apache.org/r1650768 ]

          SOLR-6324: Set finite default timeouts for select and update.

          Show
          ASF subversion and git services added a comment - Commit 1650768 from Mark Miller in branch 'dev/trunk' [ https://svn.apache.org/r1650768 ] SOLR-6324 : Set finite default timeouts for select and update.
          Hide
          ASF subversion and git services added a comment -

          Commit 1650769 from Mark Miller in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1650769 ]

          SOLR-6324: Set finite default timeouts for select and update.

          Show
          ASF subversion and git services added a comment - Commit 1650769 from Mark Miller in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1650769 ] SOLR-6324 : Set finite default timeouts for select and update.
          Hide
          Mark Miller added a comment -

          I keep forgetting to use the message tag to close the pull request. I'll try and remember that in the future.

          Show
          Mark Miller added a comment - I keep forgetting to use the message tag to close the pull request. I'll try and remember that in the future.
          Hide
          ASF GitHub Bot added a comment -

          Github user andyetitmoves closed the pull request at:

          https://github.com/apache/lucene-solr/pull/79

          Show
          ASF GitHub Bot added a comment - Github user andyetitmoves closed the pull request at: https://github.com/apache/lucene-solr/pull/79
          Hide
          Anshum Gupta added a comment -

          Bulk close after 5.0 release.

          Show
          Anshum Gupta added a comment - Bulk close after 5.0 release.

            People

            • Assignee:
              Mark Miller
              Reporter:
              Ramkumar Aiyengar
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development