Solr
  1. Solr
  2. SOLR-6496

LBHttpSolrClient should stop server retries after the timeAllowed threshold is met

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 4.9
    • Fix Version/s: 5.0, 6.0
    • Component/s: None
    • Labels:
      None

      Description

      The LBHttpSolrServer will continue to perform retries for each server it was given without honoring the timeAllowed request parameter. Once the threshold has been met, you should no longer perform retries and allow the exception to bubble up and allow the request to either error out or return partial results per the shards.tolerant request parameter.

      For a little more context on how this is can be extremely problematic please see the comment here: https://issues.apache.org/jira/browse/SOLR-5986?focusedCommentId=14100991&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14100991 (#2)

      1. SOLR-6496.patch
        5 kB
        Steve Davids
      2. SOLR-6496.patch
        3 kB
        Anshum Gupta
      3. SOLR-6496.patch
        2 kB
        Anshum Gupta
      4. SOLR-6496.patch
        2 kB
        Steve Davids
      5. SOLR-6496.patch
        2 kB
        Steve Davids
      6. SOLR-6496.patch
        2 kB
        Steve Davids

        Issue Links

          Activity

          Hide
          Steve Davids added a comment -

          Initial patch that honors the timeAllowed request parameter. There aren't any tests included – is there any objections to perhaps using a mocking library, it sure would make it much easier to perform unit testing on these negative cases. Mockito is my personal preference and is currently being used in Morphlines, but it will need to be included in the SolrJ test dependencies.

          Show
          Steve Davids added a comment - Initial patch that honors the timeAllowed request parameter. There aren't any tests included – is there any objections to perhaps using a mocking library, it sure would make it much easier to perform unit testing on these negative cases. Mockito is my personal preference and is currently being used in Morphlines, but it will need to be included in the SolrJ test dependencies.
          Hide
          Steve Davids added a comment -

          Fixed patch for null safe SolrParams check.

          Show
          Steve Davids added a comment - Fixed patch for null safe SolrParams check.
          Hide
          Anshum Gupta added a comment -

          Thanks for creating this JIRA and providing a patch Steve Davids.

          Here are a few changes that I'd recommend:

          • Use System.nanoTime() for calculating elapsed time. SOLR-5734 was recently an attempt at standardizing the same.
          • Tests are always helpful for confirming that the change works as expected . I'm personally against mocking unless there's no other way out as it generally has led to code that is difficult to understand for newer developers when they want to change something.
          Show
          Anshum Gupta added a comment - Thanks for creating this JIRA and providing a patch Steve Davids . Here are a few changes that I'd recommend: Use System.nanoTime() for calculating elapsed time. SOLR-5734 was recently an attempt at standardizing the same. Tests are always helpful for confirming that the change works as expected . I'm personally against mocking unless there's no other way out as it generally has led to code that is difficult to understand for newer developers when they want to change something.
          Hide
          Steve Davids added a comment -

          Updated patch to use nano time. Still thinking of some potential tricks to not use mocks, are there any tests out there that may screw with the jetty server to make the socket connection arbitrarily long then somehow throw an exception and verify that the next request isn't made?

          On the mocking front I would do something like (note: redundant static Mockito.* accessors only showed for demonstrative purposes):

            @Test
            public void testNoRetryOnTimeout() throws Exception {
              LBHttpSolrServer testServer = Mockito.spy(new LBHttpSolrServer("http://test1", "http://test2"));
              Mockito.doAnswer(new Answer<Exception>() {
                @Override
                public Exception answer(InvocationOnMock invocation) throws Throwable {
                  Thread.sleep(1);
                  return new SolrServerException("Mock error.");
                }}).when(testServer).doRequest(Mockito.any(HttpSolrServer.class), Mockito.any(Req.class), Mockito.any(Rsp.class), Mockito.anyBoolean(), Mockito.anyBoolean(), Mockito.anyString());
              testServer.query(SolrTestCaseJ4.params(CommonParams.Q, "test", CommonParams.TIME_ALLOWED, "1"));
              Mockito.verify(testServer, Mockito.times(1)).doRequest(Mockito.any(HttpSolrServer.class), Mockito.any(Req.class), Mockito.any(Rsp.class), Mockito.anyBoolean(), Mockito.anyBoolean(), Mockito.anyString());
            }
          

          This test actually showed some strange behavior that there are multiple implemented methods trying to do the same thing. See LBHttpSolrServer's:

          1. public Rsp request(Req req) throws SolrServerException, IOException
          2. public NamedList<Object> request(final SolrRequest request) throws SolrServerException, IOException

          So, depending on if you are using the CloudSolrServer or the HttpShardHandlerFactory you are going to get different request behavior. We should probably try to refactor this code to provide consistent behavior perhaps in a different ticket.

          Show
          Steve Davids added a comment - Updated patch to use nano time. Still thinking of some potential tricks to not use mocks, are there any tests out there that may screw with the jetty server to make the socket connection arbitrarily long then somehow throw an exception and verify that the next request isn't made? On the mocking front I would do something like (note: redundant static Mockito.* accessors only showed for demonstrative purposes): @Test public void testNoRetryOnTimeout() throws Exception { LBHttpSolrServer testServer = Mockito.spy( new LBHttpSolrServer( "http: //test1" , "http://test2" )); Mockito.doAnswer( new Answer<Exception>() { @Override public Exception answer(InvocationOnMock invocation) throws Throwable { Thread .sleep(1); return new SolrServerException( "Mock error." ); }}).when(testServer).doRequest(Mockito.any(HttpSolrServer.class), Mockito.any(Req.class), Mockito.any(Rsp.class), Mockito.anyBoolean(), Mockito.anyBoolean(), Mockito.anyString()); testServer.query(SolrTestCaseJ4.params(CommonParams.Q, "test" , CommonParams.TIME_ALLOWED, "1" )); Mockito.verify(testServer, Mockito.times(1)).doRequest(Mockito.any(HttpSolrServer.class), Mockito.any(Req.class), Mockito.any(Rsp.class), Mockito.anyBoolean(), Mockito.anyBoolean(), Mockito.anyString()); } This test actually showed some strange behavior that there are multiple implemented methods trying to do the same thing. See LBHttpSolrServer's: public Rsp request(Req req) throws SolrServerException, IOException public NamedList<Object> request(final SolrRequest request) throws SolrServerException, IOException So, depending on if you are using the CloudSolrServer or the HttpShardHandlerFactory you are going to get different request behavior. We should probably try to refactor this code to provide consistent behavior perhaps in a different ticket.
          Hide
          Shalin Shekhar Mangar added a comment -

          are there any tests out there that may screw with the jetty server to make the socket connection arbitrarily long then somehow throw an exception and verify that the next request isn't made?

          There's a SocketProxy class which can be used to disallow connections to a jetty server but there's no rate limiting in there. See HttpPartitionTest for an example.

          Show
          Shalin Shekhar Mangar added a comment - are there any tests out there that may screw with the jetty server to make the socket connection arbitrarily long then somehow throw an exception and verify that the next request isn't made? There's a SocketProxy class which can be used to disallow connections to a jetty server but there's no rate limiting in there. See HttpPartitionTest for an example.
          Hide
          Mark Miller added a comment -

          This is an important one to get into 5 - we should get it committed even if we have to a make a new JIRA to work on testing in this area.

          Show
          Mark Miller added a comment - This is an important one to get into 5 - we should get it committed even if we have to a make a new JIRA to work on testing in this area.
          Hide
          Anshum Gupta added a comment -

          Patch updated to be in sync with LBHttpSolrServer -> LBHttpSolrClient changes. Will just run a few more rounds of tests and commit.
          Will also create another issue for adding tests.

          Show
          Anshum Gupta added a comment - Patch updated to be in sync with LBHttpSolrServer -> LBHttpSolrClient changes. Will just run a few more rounds of tests and commit. Will also create another issue for adding tests.
          Hide
          Anshum Gupta added a comment -

          The correct attachment.

          Show
          Anshum Gupta added a comment - The correct attachment.
          Hide
          Anshum Gupta added a comment -

          Working on fixing a failing test.

          Show
          Anshum Gupta added a comment - Working on fixing a failing test.
          Hide
          Anshum Gupta added a comment -

          Fixed a failing test.

          Show
          Anshum Gupta added a comment - Fixed a failing test.
          Hide
          Steve Davids added a comment -

          The LB Client has duplicate implementations defined in both:

          1. public Rsp request(Req req) throws SolrServerException, IOException
          2. public NamedList<Object> request(final SolrRequest request) throws SolrServerException, IOException

          The original patch was only dealing with one of the two, we need to either a) copy the same code into the other or b) refactor the methods to have a single implementation that both methods call. Option B is my personal preference, though we might want to just do that in a separate ticket and go with option A to get it in as soon as possible. I can work on either tonight after I get back from work if anyone has a route they would like to go.

          Show
          Steve Davids added a comment - The LB Client has duplicate implementations defined in both: 1. public Rsp request(Req req) throws SolrServerException, IOException 2. public NamedList<Object> request(final SolrRequest request) throws SolrServerException, IOException The original patch was only dealing with one of the two, we need to either a) copy the same code into the other or b) refactor the methods to have a single implementation that both methods call. Option B is my personal preference, though we might want to just do that in a separate ticket and go with option A to get it in as soon as possible. I can work on either tonight after I get back from work if anyone has a route they would like to go.
          Hide
          Anshum Gupta added a comment -

          You're right. I'll let you get to it as I'm not sure if I'd get time to work on it today.
          If you're unable to get to it, I'll try handling it over the weekend.

          About the approaches, ideally, it'd be good to refactor but that certainly is more invasive. With the rc a week away, I'd like it to be less invasive at this point and work on refactoring (and adding tests) separately with the next release.

          Show
          Anshum Gupta added a comment - You're right. I'll let you get to it as I'm not sure if I'd get time to work on it today. If you're unable to get to it, I'll try handling it over the weekend. About the approaches, ideally, it'd be good to refactor but that certainly is more invasive. With the rc a week away, I'd like it to be less invasive at this point and work on refactoring (and adding tests) separately with the next release.
          Hide
          Steve Davids added a comment -

          Updated the patch to provide the same exiting functionality in the duplicate request implementation. I created SOLR-6949 to capture the refactoring that should be done to consolidate the two implementations.

          Show
          Steve Davids added a comment - Updated the patch to provide the same exiting functionality in the duplicate request implementation. I created SOLR-6949 to capture the refactoring that should be done to consolidate the two implementations.
          Hide
          Anshum Gupta added a comment - - edited

          Thanks Steve. I'll commit this later today.
          I'll just change the logic to compute timeOutTime = System.nanoTime() + timeOutNano once and use it to compare and exit.

          Show
          Anshum Gupta added a comment - - edited Thanks Steve. I'll commit this later today. I'll just change the logic to compute timeOutTime = System.nanoTime() + timeOutNano once and use it to compare and exit.
          Hide
          ASF subversion and git services added a comment -

          Commit 1651236 from Anshum Gupta in branch 'dev/trunk'
          [ https://svn.apache.org/r1651236 ]

          SOLR-6496: LBHttpSolrClient stops retrying after the timeAllowed threshold is met

          Show
          ASF subversion and git services added a comment - Commit 1651236 from Anshum Gupta in branch 'dev/trunk' [ https://svn.apache.org/r1651236 ] SOLR-6496 : LBHttpSolrClient stops retrying after the timeAllowed threshold is met
          Hide
          ASF subversion and git services added a comment -

          Commit 1651237 from Anshum Gupta in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1651237 ]

          SOLR-6496: LBHttpSolrClient stops retrying after the timeAllowed threshold is met (merge from trunk)

          Show
          ASF subversion and git services added a comment - Commit 1651237 from Anshum Gupta in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1651237 ] SOLR-6496 : LBHttpSolrClient stops retrying after the timeAllowed threshold is met (merge from trunk)
          Hide
          Anshum Gupta added a comment -

          Bulk close after 5.0 release.

          Show
          Anshum Gupta added a comment - Bulk close after 5.0 release.

            People

            • Assignee:
              Anshum Gupta
              Reporter:
              Steve Davids
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development