Solr
  1. Solr
  2. SOLR-8450

Our HttpClient retry policy is too permissive.

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.5, 6.0
    • Component/s: clients - java, SolrJ
    • Labels:
      None
    1. SOLR-8450.patch
      7 kB
      Mark Miller
    2. SOLR-8450.patch
      11 kB
      Mark Miller
    3. SOLR-8450.patch
      11 kB
      Mark Miller

      Issue Links

        Activity

        Hide
        Mark Miller added a comment -

        I guess we should audit all the internal HttpClients created in Solr clients.

        We probably want what we normally use:

              // if the request is not fully sent, we retry
              // streaming updates are not a problem, because they are not retryable
              httpClient.setHttpRequestRetryHandler(new DefaultHttpRequestRetryHandler(){
                @Override
                protected boolean handleAsIdempotent(final HttpRequest request) {
                  return false; // we can't tell if a Solr request is idempotent
                }
              });
        

        Except it is not so simples. For read requests, we would like to retry and it's just update requests we want to limit things.

        Show
        Mark Miller added a comment - I guess we should audit all the internal HttpClients created in Solr clients. We probably want what we normally use: // if the request is not fully sent, we retry // streaming updates are not a problem, because they are not retryable httpClient.setHttpRequestRetryHandler( new DefaultHttpRequestRetryHandler(){ @Override protected boolean handleAsIdempotent( final HttpRequest request) { return false ; // we can't tell if a Solr request is idempotent } }); Except it is not so simples. For read requests, we would like to retry and it's just update requests we want to limit things.
        Hide
        Mark Miller added a comment -

        We probably want what we normally use:

        Hmmm...our internal client is already using that.

        What is the exact case it is retrying in?

        Show
        Mark Miller added a comment - We probably want what we normally use: Hmmm...our internal client is already using that. What is the exact case it is retrying in?
        Hide
        Shalin Shekhar Mangar added a comment -

        I saw this while looking at the failure in SolrExampleStreamingTest.testUpdateField where a connection reset causes the request to be retried. The connection reset itself happens because during that test, we deliberately send a bad version and because there is unconsumed input, Jetty closes the connection on the client.

        Show
        Shalin Shekhar Mangar added a comment - I saw this while looking at the failure in SolrExampleStreamingTest.testUpdateField where a connection reset causes the request to be retried. The connection reset itself happens because during that test, we deliberately send a bad version and because there is unconsumed input, Jetty closes the connection on the client.
        Hide
        Mark Miller added a comment -

        Okay, that seems to fit with our retry policy? If we don't fully send the request, we allow a retry.

        Does the following mean we did not fully send everything?

        we deliberately send a bad version and because there is unconsumed input

        Show
        Mark Miller added a comment - Okay, that seems to fit with our retry policy? If we don't fully send the request, we allow a retry. Does the following mean we did not fully send everything? we deliberately send a bad version and because there is unconsumed input
        Hide
        Shalin Shekhar Mangar added a comment -

        Does the following mean we did not fully send everything?

        Yeah, that is what it seems. Because Solr throws a bad version exception and completes processing the request, Jetty closes the connection before the client can send the rest of frames. See https://issues.apache.org/jira/browse/SOLR-7339?focusedCommentId=15036185&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15036185

        Show
        Shalin Shekhar Mangar added a comment - Does the following mean we did not fully send everything? Yeah, that is what it seems. Because Solr throws a bad version exception and completes processing the request, Jetty closes the connection before the client can send the rest of frames. See https://issues.apache.org/jira/browse/SOLR-7339?focusedCommentId=15036185&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15036185
        Hide
        Mark Miller added a comment -

        That should be okay as things stands then. We on the fact that if the full request was not sent, the update should not have happened, and a retry should be fine.

        Because Solr throws a bad version exception and completes processing the request, Jetty closes the connection before the client can send the rest of frames.

        Perhaps we can fix that then? Should we be cutting off the client like this? It seems like this case should follow a normal, client sends full request, server responds with full response flow?

        Show
        Mark Miller added a comment - That should be okay as things stands then. We on the fact that if the full request was not sent, the update should not have happened, and a retry should be fine. Because Solr throws a bad version exception and completes processing the request, Jetty closes the connection before the client can send the rest of frames. Perhaps we can fix that then? Should we be cutting off the client like this? It seems like this case should follow a normal, client sends full request, server responds with full response flow?
        Hide
        Mark Miller added a comment -

        Okay, I see. I got the wrong test in my head in the Jetty upgrade issue.

        This is the concurrent update solr server, so it's streaming. We count on those entities not being retryable and the retry failing.

        // if the request is not fully sent, we retry
        // streaming updates are not a problem, because they are not retryable
        httpClient.setHttpRequestRetryHandler(new DefaultHttpRequestRetryHandler(){
        @Override
        protected boolean handleAsIdempotent(final HttpRequest request)

        Unknown macro: { return false; // we can't tell if a Solr request is idempotent }

        });

        What that means is that we don't explicitly handle reties with streaming, but we don't need to because those entities can't be retried anyway.

        Show
        Mark Miller added a comment - Okay, I see. I got the wrong test in my head in the Jetty upgrade issue. This is the concurrent update solr server, so it's streaming. We count on those entities not being retryable and the retry failing. // if the request is not fully sent, we retry // streaming updates are not a problem, because they are not retryable httpClient.setHttpRequestRetryHandler(new DefaultHttpRequestRetryHandler(){ @Override protected boolean handleAsIdempotent(final HttpRequest request) Unknown macro: { return false; // we can't tell if a Solr request is idempotent } }); What that means is that we don't explicitly handle reties with streaming, but we don't need to because those entities can't be retried anyway.
        Hide
        Shalin Shekhar Mangar added a comment -

        But we shouldn't retry such updates in the first place, no? The following is logged while running SolrExampleStreamingTest.testUpdateField when server closes the connection:

        2092 INFO  (qtp382077844-22) [    x:collection1] o.a.s.u.p.LogUpdateProcessorFactory [collection1]  webapp=/solr path=/update params={wt=xml&version=2.2}{} 0 29
        2313 INFO  (concurrentUpdateScheduler-3-thread-1-processing-http:////127.0.0.1:54589//solr//collection1) [    ] o.a.h.i.c.SystemDefaultHttpClient I/O exception (java.net.SocketException) caught when processing request to {}->http://127.0.0.1:54589: Connection reset
        2313 INFO  (concurrentUpdateScheduler-3-thread-1-processing-http:////127.0.0.1:54589//solr//collection1) [    ] o.a.h.i.c.SystemDefaultHttpClient Retrying request to {}->http://127.0.0.1:54589
        2315 INFO  (qtp382077844-21) [    x:collection1] o.a.s.u.p.LogUpdateProcessorFactory [collection1]  webapp=/solr path=/update params={wt=xml&version=2.2}{} 0 1
        2564 INFO  (concurrentUpdateScheduler-3-thread-1-processing-http:////127.0.0.1:54589//solr//collection1) [    ] o.a.h.i.c.SystemDefaultHttpClient I/O exception (java.net.SocketException) caught when processing request to {}->http://127.0.0.1:54589: Connection reset
        2564 INFO  (concurrentUpdateScheduler-3-thread-1-processing-http:////127.0.0.1:54589//solr//collection1) [    ] o.a.h.i.c.SystemDefaultHttpClient Retrying request to {}->http://127.0.0.1:54589
        2566 INFO  (qtp382077844-17) [    x:collection1] o.a.s.u.p.LogUpdateProcessorFactory [collection1]  webapp=/solr path=/update params={wt=xml&version=2.2}{} 0 1
        2815 INFO  (concurrentUpdateScheduler-3-thread-1-processing-http:////127.0.0.1:54589//solr//collection1) [    ] o.a.h.i.c.SystemDefaultHttpClient I/O exception (java.net.SocketException) caught when processing request to {}->http://127.0.0.1:54589: Connection reset
        2815 INFO  (concurrentUpdateScheduler-3-thread-1-processing-http:////127.0.0.1:54589//solr//collection1) [    ] o.a.h.i.c.SystemDefaultHttpClient Retrying request to {}->http://127.0.0.1:54589
        2817 INFO  (qtp382077844-18) [    x:collection1] o.a.s.u.p.LogUpdateProcessorFactory [collection1]  webapp=/solr path=/update params={wt=xml&version=2.2}{} 0 0
        
        Show
        Shalin Shekhar Mangar added a comment - But we shouldn't retry such updates in the first place, no? The following is logged while running SolrExampleStreamingTest.testUpdateField when server closes the connection: 2092 INFO (qtp382077844-22) [ x:collection1] o.a.s.u.p.LogUpdateProcessorFactory [collection1] webapp=/solr path=/update params={wt=xml&version=2.2}{} 0 29 2313 INFO (concurrentUpdateScheduler-3-thread-1-processing-http: ////127.0.0.1:54589//solr//collection1) [ ] o.a.h.i.c.SystemDefaultHttpClient I/O exception (java.net.SocketException) caught when processing request to {}->http://127.0.0.1:54589: Connection reset 2313 INFO (concurrentUpdateScheduler-3-thread-1-processing-http: ////127.0.0.1:54589//solr//collection1) [ ] o.a.h.i.c.SystemDefaultHttpClient Retrying request to {}->http://127.0.0.1:54589 2315 INFO (qtp382077844-21) [ x:collection1] o.a.s.u.p.LogUpdateProcessorFactory [collection1] webapp=/solr path=/update params={wt=xml&version=2.2}{} 0 1 2564 INFO (concurrentUpdateScheduler-3-thread-1-processing-http: ////127.0.0.1:54589//solr//collection1) [ ] o.a.h.i.c.SystemDefaultHttpClient I/O exception (java.net.SocketException) caught when processing request to {}->http://127.0.0.1:54589: Connection reset 2564 INFO (concurrentUpdateScheduler-3-thread-1-processing-http: ////127.0.0.1:54589//solr//collection1) [ ] o.a.h.i.c.SystemDefaultHttpClient Retrying request to {}->http://127.0.0.1:54589 2566 INFO (qtp382077844-17) [ x:collection1] o.a.s.u.p.LogUpdateProcessorFactory [collection1] webapp=/solr path=/update params={wt=xml&version=2.2}{} 0 1 2815 INFO (concurrentUpdateScheduler-3-thread-1-processing-http: ////127.0.0.1:54589//solr//collection1) [ ] o.a.h.i.c.SystemDefaultHttpClient I/O exception (java.net.SocketException) caught when processing request to {}->http://127.0.0.1:54589: Connection reset 2815 INFO (concurrentUpdateScheduler-3-thread-1-processing-http: ////127.0.0.1:54589//solr//collection1) [ ] o.a.h.i.c.SystemDefaultHttpClient Retrying request to {}->http://127.0.0.1:54589 2817 INFO (qtp382077844-18) [ x:collection1] o.a.s.u.p.LogUpdateProcessorFactory [collection1] webapp=/solr path=/update params={wt=xml&version=2.2}{} 0 0
        Hide
        Mark Miller added a comment -

        And now that I have dug more into the correct fail, you are right, this does seem a little concerning. This is very strange. It seems like we are doing a real retry when using a streaming client. That is not expected.

        Show
        Mark Miller added a comment - And now that I have dug more into the correct fail, you are right, this does seem a little concerning. This is very strange. It seems like we are doing a real retry when using a streaming client. That is not expected.
        Hide
        Mark Miller added a comment -

        This was causing my new connection reuse test in SOLR-8451 to fail on trunk (only with the jetty upgrade).

        It seems that we were retrying on ConcurrentUpdateSolrClient requests. I had expected those retries to fail as non retriable.

        Here is a patch with a subset of changes from SOLR-8451. We can use chunked encoding to detect streaming if we start using the content stream sizes in HttpSolrClient (which is more efficient anyway?).

        Show
        Mark Miller added a comment - This was causing my new connection reuse test in SOLR-8451 to fail on trunk (only with the jetty upgrade). It seems that we were retrying on ConcurrentUpdateSolrClient requests. I had expected those retries to fail as non retriable. Here is a patch with a subset of changes from SOLR-8451 . We can use chunked encoding to detect streaming if we start using the content stream sizes in HttpSolrClient (which is more efficient anyway?).
        Hide
        Mark Miller added a comment -

        I guess even still, with non streaming and knowing the size, we still don't want to retry when batching either. Not so sure how easy that is to detect.

        Show
        Mark Miller added a comment - I guess even still, with non streaming and knowing the size, we still don't want to retry when batching either. Not so sure how easy that is to detect.
        Hide
        Mark Miller added a comment -

        Well, in the end, doesn't seem to be safe to retry on any updates, even if it's a single update in the request. How about just retrying on GET?

        Show
        Mark Miller added a comment - Well, in the end, doesn't seem to be safe to retry on any updates, even if it's a single update in the request. How about just retrying on GET?
        Hide
        ASF subversion and git services added a comment -

        Commit 1723616 from Mark Miller in branch 'dev/trunk'
        [ https://svn.apache.org/r1723616 ]

        SOLR-8450: Our HttpClient retry policy is too permissive.

        Show
        ASF subversion and git services added a comment - Commit 1723616 from Mark Miller in branch 'dev/trunk' [ https://svn.apache.org/r1723616 ] SOLR-8450 : Our HttpClient retry policy is too permissive.
        Hide
        ASF subversion and git services added a comment -

        Commit 1723617 from Mark Miller in branch 'dev/trunk'
        [ https://svn.apache.org/r1723617 ]

        SOLR-8450: Add CHANGES entry.

        Show
        ASF subversion and git services added a comment - Commit 1723617 from Mark Miller in branch 'dev/trunk' [ https://svn.apache.org/r1723617 ] SOLR-8450 : Add CHANGES entry.
        Hide
        ASF subversion and git services added a comment -

        Commit 1724807 from Mark Miller in branch 'dev/trunk'
        [ https://svn.apache.org/r1724807 ]

        SOLR-8450: Fix debug logging.

        Show
        ASF subversion and git services added a comment - Commit 1724807 from Mark Miller in branch 'dev/trunk' [ https://svn.apache.org/r1724807 ] SOLR-8450 : Fix debug logging.
        Hide
        ASF subversion and git services added a comment -

        Commit 1724813 from Mark Miller in branch 'dev/trunk'
        [ https://svn.apache.org/r1724813 ]

        SOLR-8450: Do not retry admin requests, they are not idempotent.

        Show
        ASF subversion and git services added a comment - Commit 1724813 from Mark Miller in branch 'dev/trunk' [ https://svn.apache.org/r1724813 ] SOLR-8450 : Do not retry admin requests, they are not idempotent.
        Hide
        Mark Miller added a comment -

        I have to backport this to 5x now.

        Show
        Mark Miller added a comment - I have to backport this to 5x now.
        Hide
        ASF subversion and git services added a comment -

        Commit f72c3e3a8ff4bdfc65764a180227c72c43429856 in lucene-solr's branch refs/heads/branch_5x from Mark Robert Miller
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=f72c3e3 ]

        SOLR-8450: Our HttpClient retry policy is too permissive.

        git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1723616 13f79535-47bb-0310-9956-ffa450edef68

        Show
        ASF subversion and git services added a comment - Commit f72c3e3a8ff4bdfc65764a180227c72c43429856 in lucene-solr's branch refs/heads/branch_5x from Mark Robert Miller [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=f72c3e3 ] SOLR-8450 : Our HttpClient retry policy is too permissive. git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1723616 13f79535-47bb-0310-9956-ffa450edef68
        Hide
        ASF subversion and git services added a comment -

        Commit 42c17d225e2190893227ac6c710effda000afd98 in lucene-solr's branch refs/heads/branch_5x from markrmiller
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=42c17d2 ]

        SOLR-8450: Add CHANGES entry.

        Show
        ASF subversion and git services added a comment - Commit 42c17d225e2190893227ac6c710effda000afd98 in lucene-solr's branch refs/heads/branch_5x from markrmiller [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=42c17d2 ] SOLR-8450 : Add CHANGES entry.
        Hide
        ASF subversion and git services added a comment -

        Commit 9da8846535bce8793b28a7d8816e10f3b1c5a912 in lucene-solr's branch refs/heads/branch_5x from Mark Robert Miller
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=9da8846 ]

        SOLR-8450: Fix debug logging.

        git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1724807 13f79535-47bb-0310-9956-ffa450edef68

        Show
        ASF subversion and git services added a comment - Commit 9da8846535bce8793b28a7d8816e10f3b1c5a912 in lucene-solr's branch refs/heads/branch_5x from Mark Robert Miller [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=9da8846 ] SOLR-8450 : Fix debug logging. git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1724807 13f79535-47bb-0310-9956-ffa450edef68
        Hide
        ASF subversion and git services added a comment -

        Commit df004731dd3f2b1cd54af96f1aa1ead27fcf70b2 in lucene-solr's branch refs/heads/branch_5x from Mark Robert Miller
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=df00473 ]

        SOLR-8450: Do not retry admin requests, they are not idempotent.

        git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1724813 13f79535-47bb-0310-9956-ffa450edef68

        Show
        ASF subversion and git services added a comment - Commit df004731dd3f2b1cd54af96f1aa1ead27fcf70b2 in lucene-solr's branch refs/heads/branch_5x from Mark Robert Miller [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=df00473 ] SOLR-8450 : Do not retry admin requests, they are not idempotent. git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1724813 13f79535-47bb-0310-9956-ffa450edef68

          People

          • Assignee:
            Mark Miller
            Reporter:
            Shalin Shekhar Mangar
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development