Solr
  1. Solr
  2. SOLR-8453

Jetty update from 9.2 to 9.3 causes the server to reset formerly legitimate client connections.

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.5, 6.0
    • Component/s: None
    • Labels:
      None

      Description

      The basic problem is that when we are streaming in updates via a client, an update can fail in a way that further updates in the request will not be processed, but not in a way that causes the client to stop and finish up the request before the server does something else with that connection.

      This seems to mean that even after the server stops processing the request, the concurrent update client is still in the process of sending the request. It seems previously, Jetty would not go after the connection very quickly after the server processing thread was stopped via exception, and the client (usually?) had time to clean up properly. But after the Jetty upgrade from 9.2 to 9.3, Jetty closes the connection on the server sooner than previous versions , and the client does not end up getting notified of the original exception at all and instead hits a connection reset exception. The result was random fails due to connection reset throughout our tests and one particular test failing consistently. Even before this update, it does not seem like we are acting in a safe or 'behaved' manner, but our version of Jetty was relaxed enough (or a bug was fixed?) for our tests to work out.

      1. jetty9.2.pcapng
        547 kB
        Mark Miller
      2. jetty9.3.pcapng
        554 kB
        Mark Miller
      3. SOLR-8453_test.patch
        6 kB
        Yonik Seeley
      4. SOLR-8453_test.patch
        6 kB
        Yonik Seeley
      5. SOLR-8453.patch
        3 kB
        Mark Miller
      6. SOLR-8453.patch
        2 kB
        Mark Miller
      7. SOLR-8453.patch
        38 kB
        Mark Miller
      8. SOLR-8453.patch
        34 kB
        Mark Miller
      9. SOLR-8453.patch
        31 kB
        Mark Miller
      10. SOLR-8453.patch
        31 kB
        Mark Miller
      11. SOLR-8453.patch
        27 kB
        Mark Miller
      12. SOLR-8453.patch
        7 kB
        Mark Miller
      13. SOLR-8453.patch
        19 kB
        Mark Miller

        Issue Links

          Activity

          Hide
          Mark Miller added a comment -

          Here is one idea.

          Show
          Mark Miller added a comment - Here is one idea.
          Hide
          Mark Miller added a comment -

          This 'bad' behavior is kind of baked into some our test error handling expectations. When a single request multi update has a fail in it, we expect that the rest of the updates are not processed.

          Can we easily tell between a streaming request and a single multi-update request in DistributedUpdateProcessor?

          Show
          Mark Miller added a comment - This 'bad' behavior is kind of baked into some our test error handling expectations. When a single request multi update has a fail in it, we expect that the rest of the updates are not processed. Can we easily tell between a streaming request and a single multi-update request in DistributedUpdateProcessor?
          Hide
          Mark Miller added a comment -

          Rather than in DistributedUpdateProcessor, probably the right place to deal with this without changing behavior is in the code that manages the UpdateProcessor objects. That is a bit of a drag though, because a bunch of places can work with those.

          Show
          Mark Miller added a comment - Rather than in DistributedUpdateProcessor, probably the right place to deal with this without changing behavior is in the code that manages the UpdateProcessor objects. That is a bit of a drag though, because a bunch of places can work with those.
          Hide
          Mark Miller added a comment -

          I did a little more work on the above patch. You can try and simulate the old behavior by letting the client finishing sending and just do nothing with those updates after a fail. Better if we could tell the client to give up more cleanly, but anyway.

          The main issue I've found with that so far is that DocBasedVersionConstraintsProcessor actually counts on getting a physical exception for 409 conflicts. I think one other spot may too.

          Show
          Mark Miller added a comment - I did a little more work on the above patch. You can try and simulate the old behavior by letting the client finishing sending and just do nothing with those updates after a fail. Better if we could tell the client to give up more cleanly, but anyway. The main issue I've found with that so far is that DocBasedVersionConstraintsProcessor actually counts on getting a physical exception for 409 conflicts. I think one other spot may too.
          Hide
          Mark Miller added a comment -

          Here is a fairly simple and clean approach.

          We inject a new ErrorHandlingProcessor as the first processor in all chains (have not thought very long about the name yet).

          This can provide with us with the almost identical behavior to what we had previously while getting us to what we need now (to cleanly manage our client / server connection lifecycles).

          Patch still needs some work.

          Show
          Mark Miller added a comment - Here is a fairly simple and clean approach. We inject a new ErrorHandlingProcessor as the first processor in all chains (have not thought very long about the name yet). This can provide with us with the almost identical behavior to what we had previously while getting us to what we need now (to cleanly manage our client / server connection lifecycles). Patch still needs some work.
          Hide
          Mark Miller added a comment -

          Here is a more complete patch with test fixes.

          This also has a couple unrelated other changes I've tried while looking into SOLR-7339.

          Show
          Mark Miller added a comment - Here is a more complete patch with test fixes. This also has a couple unrelated other changes I've tried while looking into SOLR-7339 .
          Hide
          Shalin Shekhar Mangar added a comment -

          This is great, thanks for debugging this.

          If you move the following to the life cycle started method then you should also increase the timeout in the start() method because currently it gives up waiting for solr in 500ms.

          synchronized (JettySolrRunner.this) {
                    waitOnSolr = true;
                    JettySolrRunner.this.notify();
                  }
          

          Did you see any test failures after this change? When I tried moving waitOnSolr to life cycle started, it failed many tests. However, I tried only once so it may have been the locale problems we've seen in other places. I'll run the tests in a loop with this patch to see if I can reproduce the failures.

          Show
          Shalin Shekhar Mangar added a comment - This is great, thanks for debugging this. If you move the following to the life cycle started method then you should also increase the timeout in the start() method because currently it gives up waiting for solr in 500ms. synchronized (JettySolrRunner. this ) { waitOnSolr = true ; JettySolrRunner. this .notify(); } Did you see any test failures after this change? When I tried moving waitOnSolr to life cycle started, it failed many tests. However, I tried only once so it may have been the locale problems we've seen in other places. I'll run the tests in a loop with this patch to see if I can reproduce the failures.
          Hide
          Mark Miller added a comment - - edited

          I have not seen any failures due to that change and have been running it for a lot of days. I don't know that it's very important for this issue though - I'll probably pare a couple of those things out or into their own issues. I think most tests now are actually using that wait till cores are loaded option anyway? Logically, this does make more sense to me though.

          I'm still fighting with some DIH tests a little. Their test classes really want the 'old processor throws out an exception' behavior - and while I can easily simulate that, it seems that also leaves those tests with the random connection reset issue. Old behavior, old problem I guess. Will probably have to try and tweak those tests a bit more.

          Show
          Mark Miller added a comment - - edited I have not seen any failures due to that change and have been running it for a lot of days. I don't know that it's very important for this issue though - I'll probably pare a couple of those things out or into their own issues. I think most tests now are actually using that wait till cores are loaded option anyway? Logically, this does make more sense to me though. I'm still fighting with some DIH tests a little. Their test classes really want the 'old processor throws out an exception' behavior - and while I can easily simulate that, it seems that also leaves those tests with the random connection reset issue. Old behavior, old problem I guess. Will probably have to try and tweak those tests a bit more.
          Hide
          Mark Miller added a comment - - edited

          I'll run the tests in a loop with this patch to see if I can reproduce the failures.

          I'm still looping more recent versions. If you get a chance, update to the most recent patch. It has some changes to how our clients handle cleanup under failures. I think we overdo method.abort and I think it messes up connection reuse. On a clean server->client exception, we should simply make sure the response entity content is fully consumed and closed like a normal request.

          Show
          Mark Miller added a comment - - edited I'll run the tests in a loop with this patch to see if I can reproduce the failures. I'm still looping more recent versions. If you get a chance, update to the most recent patch. It has some changes to how our clients handle cleanup under failures. I think we overdo method.abort and I think it messes up connection reuse. On a clean server->client exception, we should simply make sure the response entity content is fully consumed and closed like a normal request.
          Hide
          Mark Miller added a comment -

          Yonik Seeley, you have a some time to look at this? We are holding off on the Jetty upgrade for a bit, but I think that has just made it easier to see issues that exist even on the older version. I think it may just be a matter of timing.

          Basically, we are not managing the connection like we are supposed to - errors, especially ones we expect to see in normal operations, should not bubble out and cut off the request. We want to use our Response.exception handling to properly notify the client, and let the request fully execute.

          Show
          Mark Miller added a comment - Yonik Seeley , you have a some time to look at this? We are holding off on the Jetty upgrade for a bit, but I think that has just made it easier to see issues that exist even on the older version. I think it may just be a matter of timing. Basically, we are not managing the connection like we are supposed to - errors, especially ones we expect to see in normal operations, should not bubble out and cut off the request. We want to use our Response.exception handling to properly notify the client, and let the request fully execute.
          Hide
          Yonik Seeley added a comment - - edited

          Catching up from the holidays... yeah, I can take a look.

          IIRC, in the past, didn't a failed update in a batch cause all the updates after that to fail as well? That was never that desirable, but I don't remember if it was ever fixed.

          the client does not end up getting notified of the original exception at all and instead hits a connection reset exception.

          I thought we caught all exceptions in the SolrDispatchFilter and never let any bubble up to Jetty.
          edit: this code is in HttpSolrCall now:

              } catch (Throwable ex) {
                sendError(ex);
          
          Show
          Yonik Seeley added a comment - - edited Catching up from the holidays... yeah, I can take a look. IIRC, in the past, didn't a failed update in a batch cause all the updates after that to fail as well? That was never that desirable, but I don't remember if it was ever fixed. the client does not end up getting notified of the original exception at all and instead hits a connection reset exception. I thought we caught all exceptions in the SolrDispatchFilter and never let any bubble up to Jetty. edit: this code is in HttpSolrCall now: } catch (Throwable ex) { sendError(ex);
          Hide
          Mark Miller added a comment -

          I thought we caught all exceptions in the SolrDispatchFilter and never let any bubble up to Jetty.

          The problem is not about whether the exception comes out of the SolrDispatchFilter - it's a matter of managing the connection.

          We want the client to handle the connection, so everything can be done cleanly. So the client needs to be able to fully send the request and then fully receive the response - that is the goal for all but really exceptional cases. If we let an exception bubble out and stop the server thread (whether the dispatch filter swallows it or not), the request thread can still be wrapping up - but if it tries to do any communication, it will get a connection reset as the server has closed the connection. We are not properly letting the client finish its request in these cases - so there is a race. We need to make sure the server does not close the connection until the request is fully done.

          Show
          Mark Miller added a comment - I thought we caught all exceptions in the SolrDispatchFilter and never let any bubble up to Jetty. The problem is not about whether the exception comes out of the SolrDispatchFilter - it's a matter of managing the connection. We want the client to handle the connection, so everything can be done cleanly. So the client needs to be able to fully send the request and then fully receive the response - that is the goal for all but really exceptional cases. If we let an exception bubble out and stop the server thread (whether the dispatch filter swallows it or not), the request thread can still be wrapping up - but if it tries to do any communication, it will get a connection reset as the server has closed the connection. We are not properly letting the client finish its request in these cases - so there is a race. We need to make sure the server does not close the connection until the request is fully done.
          Hide
          Yonik Seeley added a comment -

          That's unfortunate if one can't provide an error response before the request has finished.
          (I'm thinking of something like uploading a multi-gigabyte CSV file, but one of the fields is missing... it would be nice to have the option of saying "abort on any errors", and still get the specific error message back).

          Trying to figure out if this is a Jetty regression (regardless of how we want to act - which would seem to depend on context).

          An HTTP/1.1 (or later) client sending a message-body SHOULD monitor the network connection for an error status while it is transmitting the request. If the client sees an error status, it SHOULD immediately cease transmitting the body.
          
          Show
          Yonik Seeley added a comment - That's unfortunate if one can't provide an error response before the request has finished. (I'm thinking of something like uploading a multi-gigabyte CSV file, but one of the fields is missing... it would be nice to have the option of saying "abort on any errors", and still get the specific error message back). Trying to figure out if this is a Jetty regression (regardless of how we want to act - which would seem to depend on context). An HTTP/1.1 (or later) client sending a message-body SHOULD monitor the network connection for an error status while it is transmitting the request. If the client sees an error status, it SHOULD immediately cease transmitting the body.
          Hide
          Yonik Seeley added a comment - - edited

          That's unfortunate if one can't provide an error response before the request has finished.

          Hmmm, OK... it doesn't look like that's happening:

          ~$ nc 127.0.0.1 8983
          POST /solr/techproducts/update HTTP/1.1
          Host: localhost:8983
          User-Agent: curl/7.43.0
          Accept: */*
          Content-type:application/json
          Content-Length: 1000000000
          
          
          [
           {"id_not_exist" : "TestDoc1", "title" : "test1"},
          
          HTTP/1.1 400 Bad Request
          Content-Type: text/plain;charset=utf-8
          Transfer-Encoding: chunked
          
          7E
          {"responseHeader":{"status":400,"QTime":6312},"error":{"msg":"Document is missing mandatory uniqueKey field: id","code":400}}
          
          0
          

          I guess this suggests that we should be able to handle things better on the client side?

          Show
          Yonik Seeley added a comment - - edited That's unfortunate if one can't provide an error response before the request has finished. Hmmm, OK... it doesn't look like that's happening: ~$ nc 127.0.0.1 8983 POST /solr/techproducts/update HTTP/1.1 Host: localhost:8983 User-Agent: curl/7.43.0 Accept: */* Content-type:application/json Content-Length: 1000000000 [ { "id_not_exist" : "TestDoc1" , "title" : "test1" }, HTTP/1.1 400 Bad Request Content-Type: text/plain;charset=utf-8 Transfer-Encoding: chunked 7E { "responseHeader" :{ "status" :400, "QTime" :6312}, "error" :{ "msg" : "Document is missing mandatory uniqueKey field: id" , "code" :400}} 0 I guess this suggests that we should be able to handle things better on the client side?
          Hide
          Mark Miller added a comment -

          I think it's more related to using HttpClient than http. We see random connection resets in many tests that go away with this, but looking at the consistent test we have that fails (SolrExampleStreamingTest#testUpdateField), we seem to hit the problem when HttpClient is cleaning up and closing the outputstream, which flushes a buffer.

          java.net.SocketException: Connection reset
          	at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:113)
          	at java.net.SocketOutputStream.write(SocketOutputStream.java:153)
          	at org.apache.http.impl.io.AbstractSessionOutputBuffer.flushBuffer(AbstractSessionOutputBuffer.java:159)
          	at org.apache.http.impl.io.AbstractSessionOutputBuffer.flush(AbstractSessionOutputBuffer.java:166)
          	at org.apache.http.impl.io.ChunkedOutputStream.close(ChunkedOutputStream.java:205)
          	at org.apache.http.impl.entity.EntitySerializer.serialize(EntitySerializer.java:118)
          	at org.apache.http.impl.AbstractHttpClientConnection.sendRequestEntity(AbstractHttpClientConnection.java:265)
          	at org.apache.http.impl.conn.ManagedClientConnectionImpl.sendRequestEntity(ManagedClientConnectionImpl.java:203)
          	at org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:237)
          	at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:122)
          	at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:685)
          	at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:487)
          	at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:882)
          	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
          	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:107)
          	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)
          	at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.sendUpdateStream(ConcurrentUpdateSolrClient.java:280)
          	at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:161)
          

          It may also be that it only happens with this chunked encoding. On close, it tries to write a 'closing chunk' and then flush: https://github.com/apache/httpcore/blob/4.0.x/httpcore/src/main/java/org/apache/http/impl/io/ChunkedOutputStream.java

          If there is a problem here we get the connection reset.

          It does actually seem like a bit of a race to me and I'm not sure how to address that yet (other than this patch). If you remove the 250ms poll that happens in ConcurrentUpdateSolrClient->sendUpdateStream->EntityTemplate->writeTo, it seems to go away. But that would indicate our connection management is a bit fragile, with the client kind of racing the server.

          Still playing around to try and find other potential fixes.

          Show
          Mark Miller added a comment - I think it's more related to using HttpClient than http. We see random connection resets in many tests that go away with this, but looking at the consistent test we have that fails (SolrExampleStreamingTest#testUpdateField), we seem to hit the problem when HttpClient is cleaning up and closing the outputstream, which flushes a buffer. java.net.SocketException: Connection reset at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:113) at java.net.SocketOutputStream.write(SocketOutputStream.java:153) at org.apache.http.impl.io.AbstractSessionOutputBuffer.flushBuffer(AbstractSessionOutputBuffer.java:159) at org.apache.http.impl.io.AbstractSessionOutputBuffer.flush(AbstractSessionOutputBuffer.java:166) at org.apache.http.impl.io.ChunkedOutputStream.close(ChunkedOutputStream.java:205) at org.apache.http.impl.entity.EntitySerializer.serialize(EntitySerializer.java:118) at org.apache.http.impl.AbstractHttpClientConnection.sendRequestEntity(AbstractHttpClientConnection.java:265) at org.apache.http.impl.conn.ManagedClientConnectionImpl.sendRequestEntity(ManagedClientConnectionImpl.java:203) at org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:237) at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:122) at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:685) at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:487) at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:882) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:107) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55) at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.sendUpdateStream(ConcurrentUpdateSolrClient.java:280) at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:161) It may also be that it only happens with this chunked encoding. On close, it tries to write a 'closing chunk' and then flush: https://github.com/apache/httpcore/blob/4.0.x/httpcore/src/main/java/org/apache/http/impl/io/ChunkedOutputStream.java If there is a problem here we get the connection reset. It does actually seem like a bit of a race to me and I'm not sure how to address that yet (other than this patch). If you remove the 250ms poll that happens in ConcurrentUpdateSolrClient->sendUpdateStream->EntityTemplate->writeTo, it seems to go away. But that would indicate our connection management is a bit fragile, with the client kind of racing the server. Still playing around to try and find other potential fixes.
          Hide
          Mark Miller added a comment -

          If you remove the 250ms poll that happens ... with the client kind of racing the server.

          On trunk right now, you have to drop that poll to 12ms or less on my machine to get the test to pass.

          Show
          Mark Miller added a comment - If you remove the 250ms poll that happens ... with the client kind of racing the server. On trunk right now, you have to drop that poll to 12ms or less on my machine to get the test to pass.
          Hide
          Mark Miller added a comment -

          On trunk right now, you have to drop that poll to 12ms or less on my machine to get the test to pass.

          And on 5x, Jetty is not sensitive to the length of the poll it seems (at least up to 30 seconds).

          Show
          Mark Miller added a comment - On trunk right now, you have to drop that poll to 12ms or less on my machine to get the test to pass. And on 5x, Jetty is not sensitive to the length of the poll it seems (at least up to 30 seconds).
          Hide
          Yonik Seeley added a comment -

          Here's a test with normal solrj clients that reproduces HTTP level exceptions. It uses multiple threads and large request sizes.

          Example exception summary (from 10 clients):

          3567 ERROR (TEST-TestSolrJErrorHandling.testWithBinary-seed#[5C3578C3D02417A3]) [    ] o.a.s.c.s.TestSolrJErrorHandling CHAIN: ->SolrServerException->SocketException(Connection reset)
          3569 ERROR (TEST-TestSolrJErrorHandling.testWithBinary-seed#[5C3578C3D02417A3]) [    ] o.a.s.c.s.TestSolrJErrorHandling CHAIN: ->SolrServerException->ClientProtocolException->NonRepeatableRequestException->SocketException(Broken pipe)
          3569 ERROR (TEST-TestSolrJErrorHandling.testWithBinary-seed#[5C3578C3D02417A3]) [    ] o.a.s.c.s.TestSolrJErrorHandling CHAIN: ->SolrServerException->ClientProtocolException->NonRepeatableRequestException->SocketException(Broken pipe)
          3570 ERROR (TEST-TestSolrJErrorHandling.testWithBinary-seed#[5C3578C3D02417A3]) [    ] o.a.s.c.s.TestSolrJErrorHandling CHAIN: ->SolrServerException->ClientProtocolException->NonRepeatableRequestException->SocketException(Broken pipe)
          3571 ERROR (TEST-TestSolrJErrorHandling.testWithBinary-seed#[5C3578C3D02417A3]) [    ] o.a.s.c.s.TestSolrJErrorHandling CHAIN: ->SolrServerException->ClientProtocolException->NonRepeatableRequestException->SocketException(Broken pipe)
          3571 ERROR (TEST-TestSolrJErrorHandling.testWithBinary-seed#[5C3578C3D02417A3]) [    ] o.a.s.c.s.TestSolrJErrorHandling CHAIN: ->SolrServerException->ClientProtocolException->NonRepeatableRequestException->SocketException(Broken pipe)
          3571 ERROR (TEST-TestSolrJErrorHandling.testWithBinary-seed#[5C3578C3D02417A3]) [    ] o.a.s.c.s.TestSolrJErrorHandling CHAIN: ->SolrServerException->ClientProtocolException->NonRepeatableRequestException->SocketException(Broken pipe)
          3572 ERROR (TEST-TestSolrJErrorHandling.testWithBinary-seed#[5C3578C3D02417A3]) [    ] o.a.s.c.s.TestSolrJErrorHandling CHAIN: ->SolrServerException->ClientProtocolException->NonRepeatableRequestException->SocketException(Broken pipe)
          3572 ERROR (TEST-TestSolrJErrorHandling.testWithBinary-seed#[5C3578C3D02417A3]) [    ] o.a.s.c.s.TestSolrJErrorHandling CHAIN: ->SolrServerException->ClientProtocolException->NonRepeatableRequestException->SocketException(Broken pipe)
          3572 ERROR (TEST-TestSolrJErrorHandling.testWithBinary-seed#[5C3578C3D02417A3]) [    ] o.a.s.c.s.TestSolrJErrorHandling CHAIN: ->SolrServerException->ClientProtocolException->NonRepeatableRequestException->SocketException(Broken pipe)
          3573 INFO  (TEST-TestSolrJErrorHandling.testWithBinary-seed#[5C3578C3D02417A3]) [    ] o.a.s.SolrTestCaseJ4 ###Ending testWithBinary
          
          Show
          Yonik Seeley added a comment - Here's a test with normal solrj clients that reproduces HTTP level exceptions. It uses multiple threads and large request sizes. Example exception summary (from 10 clients): 3567 ERROR (TEST-TestSolrJErrorHandling.testWithBinary-seed#[5C3578C3D02417A3]) [ ] o.a.s.c.s.TestSolrJErrorHandling CHAIN: ->SolrServerException->SocketException(Connection reset) 3569 ERROR (TEST-TestSolrJErrorHandling.testWithBinary-seed#[5C3578C3D02417A3]) [ ] o.a.s.c.s.TestSolrJErrorHandling CHAIN: ->SolrServerException->ClientProtocolException->NonRepeatableRequestException->SocketException(Broken pipe) 3569 ERROR (TEST-TestSolrJErrorHandling.testWithBinary-seed#[5C3578C3D02417A3]) [ ] o.a.s.c.s.TestSolrJErrorHandling CHAIN: ->SolrServerException->ClientProtocolException->NonRepeatableRequestException->SocketException(Broken pipe) 3570 ERROR (TEST-TestSolrJErrorHandling.testWithBinary-seed#[5C3578C3D02417A3]) [ ] o.a.s.c.s.TestSolrJErrorHandling CHAIN: ->SolrServerException->ClientProtocolException->NonRepeatableRequestException->SocketException(Broken pipe) 3571 ERROR (TEST-TestSolrJErrorHandling.testWithBinary-seed#[5C3578C3D02417A3]) [ ] o.a.s.c.s.TestSolrJErrorHandling CHAIN: ->SolrServerException->ClientProtocolException->NonRepeatableRequestException->SocketException(Broken pipe) 3571 ERROR (TEST-TestSolrJErrorHandling.testWithBinary-seed#[5C3578C3D02417A3]) [ ] o.a.s.c.s.TestSolrJErrorHandling CHAIN: ->SolrServerException->ClientProtocolException->NonRepeatableRequestException->SocketException(Broken pipe) 3571 ERROR (TEST-TestSolrJErrorHandling.testWithBinary-seed#[5C3578C3D02417A3]) [ ] o.a.s.c.s.TestSolrJErrorHandling CHAIN: ->SolrServerException->ClientProtocolException->NonRepeatableRequestException->SocketException(Broken pipe) 3572 ERROR (TEST-TestSolrJErrorHandling.testWithBinary-seed#[5C3578C3D02417A3]) [ ] o.a.s.c.s.TestSolrJErrorHandling CHAIN: ->SolrServerException->ClientProtocolException->NonRepeatableRequestException->SocketException(Broken pipe) 3572 ERROR (TEST-TestSolrJErrorHandling.testWithBinary-seed#[5C3578C3D02417A3]) [ ] o.a.s.c.s.TestSolrJErrorHandling CHAIN: ->SolrServerException->ClientProtocolException->NonRepeatableRequestException->SocketException(Broken pipe) 3572 ERROR (TEST-TestSolrJErrorHandling.testWithBinary-seed#[5C3578C3D02417A3]) [ ] o.a.s.c.s.TestSolrJErrorHandling CHAIN: ->SolrServerException->ClientProtocolException->NonRepeatableRequestException->SocketException(Broken pipe) 3573 INFO (TEST-TestSolrJErrorHandling.testWithBinary-seed#[5C3578C3D02417A3]) [ ] o.a.s.SolrTestCaseJ4 ###Ending testWithBinary
          Hide
          Yonik Seeley added a comment -

          Here's an updated test that dedups the exception summary and cranks up the number of client threads, just to see what type of errors we can get.

          10714 ERROR (TEST-TestSolrJErrorHandling.testWithXml-seed#[CDCE136AF9E0FF01]) [    ] o.a.s.c.s.TestSolrJErrorHandling EXCEPTION LIST:
          	98) SolrServerException->ClientProtocolException->NonRepeatableRequestException->SocketException(Broken pipe)
          	2) SolrServerException->ClientProtocolException->NonRepeatableRequestException->SocketException(Protocol wrong type for socket)
          
          Show
          Yonik Seeley added a comment - Here's an updated test that dedups the exception summary and cranks up the number of client threads, just to see what type of errors we can get. 10714 ERROR (TEST-TestSolrJErrorHandling.testWithXml-seed#[CDCE136AF9E0FF01]) [ ] o.a.s.c.s.TestSolrJErrorHandling EXCEPTION LIST: 98) SolrServerException->ClientProtocolException->NonRepeatableRequestException->SocketException(Broken pipe) 2) SolrServerException->ClientProtocolException->NonRepeatableRequestException->SocketException(Protocol wrong type for socket)
          Hide
          Yonik Seeley added a comment -

          This test currently passes on Solr 5x.

          Show
          Yonik Seeley added a comment - This test currently passes on Solr 5x.
          Hide
          Mark Miller added a comment - - edited

          Okay, I'm starting to think it's a change in consumeAll in - https://github.com/eclipse/jetty.project/blame/jetty-9.3.x/jetty-server/src/main/java/org/eclipse/jetty/server/HttpInput.java#L443

          I think perhaps that is now returning false in https://github.com/eclipse/jetty.project/blob/jetty-9.3.x/jetty-server/src/main/java/org/eclipse/jetty/server/HttpConnection.java#L403

          It's looking to me like the server resets the connection because of unconsumed content, and previously it must have been properly consuming the extra.

          Show
          Mark Miller added a comment - - edited Okay, I'm starting to think it's a change in consumeAll in - https://github.com/eclipse/jetty.project/blame/jetty-9.3.x/jetty-server/src/main/java/org/eclipse/jetty/server/HttpInput.java#L443 I think perhaps that is now returning false in https://github.com/eclipse/jetty.project/blob/jetty-9.3.x/jetty-server/src/main/java/org/eclipse/jetty/server/HttpConnection.java#L403 It's looking to me like the server resets the connection because of unconsumed content, and previously it must have been properly consuming the extra.
          Hide
          Mark Miller added a comment -

          Here is consumeAll from 9.2:

          https://github.com/eclipse/jetty.project/blame/jetty-9.2.x/jetty-server/src/main/java/org/eclipse/jetty/server/HttpInput.java#L281

          There are a couple possible changes that may be doing this to us:

          9.3

              public boolean consumeAll()
              {
                  synchronized (_inputQ)
                  {
                      try
                      {
                          while (!isFinished())
                          {
                              Content item = nextContent();
                              if (item == null)
                                  break; // Let's not bother blocking
          
                              skip(item, remaining(item));
                          }
                          return isFinished() && !isError();
                      }
                      catch (IOException e)
                      {
                          LOG.debug(e);
                          return false;
                      }
                  }
              }
          

          9.2

              public boolean consumeAll()
              {
                  synchronized (lock())
                  {
                      // Don't bother reading if we already know there was an error.
                      if (_onError != null)
                          return false;
          
                      try
                      {
                          while (!isFinished())
                          {
                              T item = getNextContent();
                              if (item == null)
                                  _contentState.waitForContent(this);
                              else
                                  consume(item, remaining(item));
                          }
                          return true;
                      }
                      catch (IOException e)
                      {
                          LOG.debug(e);
                          return false;
                      }
                  }
              }
          

          I would first guess it's the change of not waiting for a new entity on item==null with _contentState.waitForContent(this) anymore. The other change looks to be instead of just checking the error status at the start, we check it at the end.

          Show
          Mark Miller added a comment - Here is consumeAll from 9.2: https://github.com/eclipse/jetty.project/blame/jetty-9.2.x/jetty-server/src/main/java/org/eclipse/jetty/server/HttpInput.java#L281 There are a couple possible changes that may be doing this to us: 9.3 public boolean consumeAll() { synchronized (_inputQ) { try { while (!isFinished()) { Content item = nextContent(); if (item == null ) break ; // Let's not bother blocking skip(item, remaining(item)); } return isFinished() && !isError(); } catch (IOException e) { LOG.debug(e); return false ; } } } 9.2 public boolean consumeAll() { synchronized (lock()) { // Don't bother reading if we already know there was an error. if (_onError != null ) return false ; try { while (!isFinished()) { T item = getNextContent(); if (item == null ) _contentState.waitForContent( this ); else consume(item, remaining(item)); } return true ; } catch (IOException e) { LOG.debug(e); return false ; } } } I would first guess it's the change of not waiting for a new entity on item==null with _contentState.waitForContent(this) anymore. The other change looks to be instead of just checking the error status at the start, we check it at the end.
          Hide
          Mark Miller added a comment -

          If that change is the issue (and it was added in 9.3), it's this commit: https://github.com/eclipse/jetty.project/commit/ebaf84b97ec50c6836c230f9068b46ad8030f974 Refactored HttpInput to use poison pills

          I don't see any tracking issue for it.

          Show
          Mark Miller added a comment - If that change is the issue (and it was added in 9.3), it's this commit: https://github.com/eclipse/jetty.project/commit/ebaf84b97ec50c6836c230f9068b46ad8030f974 Refactored HttpInput to use poison pills I don't see any tracking issue for it.
          Hide
          Mark Miller added a comment -

          Yup, seems to go away with this hack:

              public boolean consumeAll()
              {
                  synchronized (_inputQ)
                  {
                      try
                      {
                          while (!isFinished())
                          {
                              Content item = nextContent();
                              if (item == null) {
                                blockForContent();
                                item = nextContent();
                               }
          
                              if (item != null) skip(item, remaining(item));
                          }
                      return true;
                      }
                      catch (IOException e)
                      {
                          LOG.debug(e);
                          return false;
                      }
                  }
              }
          
          Show
          Mark Miller added a comment - Yup, seems to go away with this hack: public boolean consumeAll() { synchronized (_inputQ) { try { while (!isFinished()) { Content item = nextContent(); if (item == null ) { blockForContent(); item = nextContent(); } if (item != null ) skip(item, remaining(item)); } return true ; } catch (IOException e) { LOG.debug(e); return false ; } } }
          Hide
          Yonik Seeley added a comment -

          Nice find!

          I've been messing around on the client side with HttpURLConnection and raw sockets.

          Even with HttpURLConnection (the stuff built into the JVM) I can't even get the status code...
          The server has probably written the error response, but the client side wants to push the complete request before looking at it I guess:

          Caused by: java.io.IOException: Error writing to server
          	at sun.net.www.protocol.http.HttpURLConnection.writeRequests(HttpURLConnection.java:665)
          	at sun.net.www.protocol.http.HttpURLConnection.writeRequests(HttpURLConnection.java:677)
          	at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1533)
          	at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1440)
          	at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480)
          	at org.apache.solr.client.solrj.TestSolrJErrorHandling.testHttpURLConnection(TestSolrJErrorHandling.java:242)
          

          Catching the exception and trying to get the errorStream also doesn't work.

          Anyway, I had a thought that even if we could fix this in the SolrJ clients, there are a lot of other clients out there that will get bitten by this.
          Regardless of what improvements we make to SolrJ, it seems like we should roll back the Jetty upgrade.

          Show
          Yonik Seeley added a comment - Nice find! I've been messing around on the client side with HttpURLConnection and raw sockets. Even with HttpURLConnection (the stuff built into the JVM) I can't even get the status code... The server has probably written the error response, but the client side wants to push the complete request before looking at it I guess: Caused by: java.io.IOException: Error writing to server at sun.net.www.protocol.http.HttpURLConnection.writeRequests(HttpURLConnection.java:665) at sun.net.www.protocol.http.HttpURLConnection.writeRequests(HttpURLConnection.java:677) at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1533) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1440) at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480) at org.apache.solr.client.solrj.TestSolrJErrorHandling.testHttpURLConnection(TestSolrJErrorHandling.java:242) Catching the exception and trying to get the errorStream also doesn't work. Anyway, I had a thought that even if we could fix this in the SolrJ clients, there are a lot of other clients out there that will get bitten by this. Regardless of what improvements we make to SolrJ, it seems like we should roll back the Jetty upgrade.
          Hide
          Joakim Erdfelt added a comment -

          You should probably have reached out to us (at Jetty) as soon as you discovered this.
          Before you even dug into the Jetty source.

          Do you have a wireshark capture of this behavior?
          Lets see what is happening on the network first.

          Show
          Joakim Erdfelt added a comment - You should probably have reached out to us (at Jetty) as soon as you discovered this. Before you even dug into the Jetty source. Do you have a wireshark capture of this behavior? Lets see what is happening on the network first.
          Hide
          Mark Miller added a comment -

          I think someone reached out to the Jetty project in SOLR-7339.

          I'll attach a wireshark log I have.

          Wireshark and debug logs all confirm this change in behavior.

          Show
          Mark Miller added a comment - I think someone reached out to the Jetty project in SOLR-7339 . I'll attach a wireshark log I have. Wireshark and debug logs all confirm this change in behavior.
          Hide
          Mark Miller added a comment -

          Anyway, I had a thought that even if we could fix this in the SolrJ clients, there are a lot of other clients out there that will get bitten by this.

          I would think so. It seems to just be a matter of if the client pauses sending the request long enough to get cut off. We are able to trigger it so easily because ConcurrentUpdateSolrClient will generally pause 250 ms after adding a single doc. A GC event or load or something of the like could cause the same thing.

          Show
          Mark Miller added a comment - Anyway, I had a thought that even if we could fix this in the SolrJ clients, there are a lot of other clients out there that will get bitten by this. I would think so. It seems to just be a matter of if the client pauses sending the request long enough to get cut off. We are able to trigger it so easily because ConcurrentUpdateSolrClient will generally pause 250 ms after adding a single doc. A GC event or load or something of the like could cause the same thing.
          Hide
          ASF subversion and git services added a comment -

          Commit 1723613 from Yonik Seeley in branch 'dev/trunk'
          [ https://svn.apache.org/r1723613 ]

          SOLR-8453: test HTTP client error responses

          Show
          ASF subversion and git services added a comment - Commit 1723613 from Yonik Seeley in branch 'dev/trunk' [ https://svn.apache.org/r1723613 ] SOLR-8453 : test HTTP client error responses
          Hide
          Greg Wilkins added a comment -

          Copying this comment over from the jetty-users discussion (and expanding somewhat)

          This is indeed a change in behaviour in jetty 9.3, but not one that should affect any application.

          The problem with consumeAll previously is that it was an unlimited commitment - a client could send chunks forever and consume a thread forever. The only reason we need to consume all is to maintain a persistent connection. But a server is under no obligation to maintain a persistent connection (even if the application consumes all the request data). So if jetty doesn't consumeAll, we can close that connection and the client needs to start a new one.

          So the approach that 9.3 takes is that if the application has not read all it's content (which is normally an error condition of the application), then jetty will make a best effort attempt to read the content, but only if it can do so without blocking. If it has to block - it gives up and the connection is closed. The decision is that it is better to throw away the connection rather than commit to blocking for an unknown period of time. This is more important now as apps are using the async APIs and are configured with minimal threadpools - thus we need to avoid places where the container can block and consume the thread pool (and thus be vulnerable to DOS attacks).

          So your client needs to be more robust in this circumstance and/or your application should make a better attempt to consume all the data.

          Even if we reinstated the behaviour in jetty - it would really just be happenstance that your client works and intermediaries/proxies could change the behaviour.

          Show
          Greg Wilkins added a comment - Copying this comment over from the jetty-users discussion (and expanding somewhat) This is indeed a change in behaviour in jetty 9.3, but not one that should affect any application. The problem with consumeAll previously is that it was an unlimited commitment - a client could send chunks forever and consume a thread forever. The only reason we need to consume all is to maintain a persistent connection. But a server is under no obligation to maintain a persistent connection (even if the application consumes all the request data). So if jetty doesn't consumeAll, we can close that connection and the client needs to start a new one. So the approach that 9.3 takes is that if the application has not read all it's content (which is normally an error condition of the application), then jetty will make a best effort attempt to read the content, but only if it can do so without blocking. If it has to block - it gives up and the connection is closed. The decision is that it is better to throw away the connection rather than commit to blocking for an unknown period of time. This is more important now as apps are using the async APIs and are configured with minimal threadpools - thus we need to avoid places where the container can block and consume the thread pool (and thus be vulnerable to DOS attacks). So your client needs to be more robust in this circumstance and/or your application should make a better attempt to consume all the data. Even if we reinstated the behaviour in jetty - it would really just be happenstance that your client works and intermediaries/proxies could change the behaviour.
          Hide
          Greg Wilkins added a comment -

          Note I just added a comment on 7339 - that this can be worked around by adding a filter that consumes all the content. But the client should still be fixed IMHO.

          Show
          Greg Wilkins added a comment - Note I just added a comment on 7339 - that this can be worked around by adding a filter that consumes all the content. But the client should still be fixed IMHO.
          Hide
          ASF subversion and git services added a comment -

          Commit 1723646 from Tomás Fernández Löbbe in branch 'dev/trunk'
          [ https://svn.apache.org/r1723646 ]

          SOLR-8453: Fix precommit

          Show
          ASF subversion and git services added a comment - Commit 1723646 from Tomás Fernández Löbbe in branch 'dev/trunk' [ https://svn.apache.org/r1723646 ] SOLR-8453 : Fix precommit
          Hide
          Mark Miller added a comment -

          But the client should still be fixed IMHO.

          On the Jetty mailing list you mentioned that Apache HttpClient seemed to be acting reasonably. I'm not sure how that relates to this comment about the client being fixed?

          Show
          Mark Miller added a comment - But the client should still be fixed IMHO. On the Jetty mailing list you mentioned that Apache HttpClient seemed to be acting reasonably. I'm not sure how that relates to this comment about the client being fixed?
          Hide
          Mark Miller added a comment -

          To expand:

          If HttpClient is behaving reasonably by throwing an exception about connection reset, Solr cannot get the actual server exception, which is essential to our proper operation. I don't see what we can do as a client without changes to HttpClient.

          Show
          Mark Miller added a comment - To expand: If HttpClient is behaving reasonably by throwing an exception about connection reset, Solr cannot get the actual server exception, which is essential to our proper operation. I don't see what we can do as a client without changes to HttpClient.
          Hide
          Greg Wilkins added a comment -

          Ah - well I've evolved my view a bit, although the client (as in the part that uses the HttpClient) may need to be fixed.

          To repeat what I stated on the jetty-user list, I think both client and server are acting reasonably. Jetty sees an application return without having consumed all of a request - it assumes this is an error condition, makes a best effort attempt to consume, but when that requires blocking decides to close the connection before all the content is sent.

          The HttpClient is sending a chunked body and the connection is closed before it sends the terminal chunk, so apache is correct to return an exceptional condition to the client application, as the request did not end correctly. I guess the HttpClient could potentially do better by making the status received in the response available, but then it is in a race because the close may occur prior to the response being read/parsed/processed.

          So if the client applications cannot be made more robust - ie it can't reasonably handle such connection failures without knowing the status code, then the best solution is to avoid the server creating the ambiguity in the first place. If it is important for the client that all the content is consumed, then the server application should consume all the content, even if there is an error. A filter can achieve that or perhaps a sendError wrapper.

          We really don't want Jetty to have to do this consumeAll, because we then cannot protect asynchronous apps with small thread pools from DOS attacks.

          cheers

          Show
          Greg Wilkins added a comment - Ah - well I've evolved my view a bit, although the client (as in the part that uses the HttpClient) may need to be fixed. To repeat what I stated on the jetty-user list, I think both client and server are acting reasonably. Jetty sees an application return without having consumed all of a request - it assumes this is an error condition, makes a best effort attempt to consume, but when that requires blocking decides to close the connection before all the content is sent. The HttpClient is sending a chunked body and the connection is closed before it sends the terminal chunk, so apache is correct to return an exceptional condition to the client application, as the request did not end correctly. I guess the HttpClient could potentially do better by making the status received in the response available, but then it is in a race because the close may occur prior to the response being read/parsed/processed. So if the client applications cannot be made more robust - ie it can't reasonably handle such connection failures without knowing the status code, then the best solution is to avoid the server creating the ambiguity in the first place. If it is important for the client that all the content is consumed, then the server application should consume all the content, even if there is an error. A filter can achieve that or perhaps a sendError wrapper. We really don't want Jetty to have to do this consumeAll, because we then cannot protect asynchronous apps with small thread pools from DOS attacks. cheers
          Hide
          Mark Miller added a comment -

          Okay here is a patch that has the filter read out the inputstream on errors.

          Show
          Mark Miller added a comment - Okay here is a patch that has the filter read out the inputstream on errors.
          Hide
          Mark Miller added a comment -

          So if the client applications cannot be made more robust - ie it can't reasonably handle such connection failures without knowing the status code

          I don't think we can. As a user of HttpClient, we don't care about the connection reset. We do care about the error on the server though. We would want to be able to get the error, and ignore that we won't be reusing that connection. Solr wants connection reuse, but we don't want to be interrupted when it doesn't happen (especially if that also means we don't get the server error). I don't think we can get that option in the short term though, so best bet seems to be as you suggest in a filter.

          Show
          Mark Miller added a comment - So if the client applications cannot be made more robust - ie it can't reasonably handle such connection failures without knowing the status code I don't think we can. As a user of HttpClient, we don't care about the connection reset. We do care about the error on the server though. We would want to be able to get the error, and ignore that we won't be reusing that connection. Solr wants connection reuse, but we don't want to be interrupted when it doesn't happen (especially if that also means we don't get the server error). I don't think we can get that option in the short term though, so best bet seems to be as you suggest in a filter.
          Hide
          ASF subversion and git services added a comment -

          Commit 1723660 from Uwe Schindler in branch 'dev/trunk'
          [ https://svn.apache.org/r1723660 ]

          SOLR-8453: Fix raw socket test to correctly use HTTP spec: headers in HTTP are US-ASCII, body's Content-Length is number of bytes (not chars!), PrintWriter swallows Exceptions and was not needed (Forbidden apis found the bug, but fix was incorrect)

          Show
          ASF subversion and git services added a comment - Commit 1723660 from Uwe Schindler in branch 'dev/trunk' [ https://svn.apache.org/r1723660 ] SOLR-8453 : Fix raw socket test to correctly use HTTP spec: headers in HTTP are US-ASCII, body's Content-Length is number of bytes (not chars!), PrintWriter swallows Exceptions and was not needed (Forbidden apis found the bug, but fix was incorrect)
          Hide
          Yonik Seeley added a comment -

          I guess the HttpClient could potentially do better by making the status received in the response available, but then it is in a race because the close may occur prior to the response being read/parsed/processed.

          Not sure I understand this part. At the OS/socket level, the server can send the response and immediately close the socket, and the client (if written properly) can always read the response.

          Show
          Yonik Seeley added a comment - I guess the HttpClient could potentially do better by making the status received in the response available, but then it is in a race because the close may occur prior to the response being read/parsed/processed. Not sure I understand this part. At the OS/socket level, the server can send the response and immediately close the socket, and the client (if written properly) can always read the response.
          Hide
          Mark Miller added a comment -

          I think this patch is okay. I change consomeInput to not throw out an Exception , instead we just log it, and I added a bit of doc that links to this JIRA.

          Show
          Mark Miller added a comment - I think this patch is okay. I change consomeInput to not throw out an Exception , instead we just log it, and I added a bit of doc that links to this JIRA.
          Hide
          Yonik Seeley added a comment -

          +1, looks fine.

          Show
          Yonik Seeley added a comment - +1, looks fine.
          Hide
          Greg Wilkins added a comment -

          It depends on how the socket is closed. If the socket is RST rather than FIN'd then it is possible for the RST to overtake sent data - ie to cause some kind of exception with unconsumed data in a buffer. I have no idea if that is really the case or not.

          Jetty will to a proper close only if it has completed a HTTP message. Otherwise the close may be more brutal and may result in a RST (not directly under the control of java).

          Show
          Greg Wilkins added a comment - It depends on how the socket is closed. If the socket is RST rather than FIN'd then it is possible for the RST to overtake sent data - ie to cause some kind of exception with unconsumed data in a buffer. I have no idea if that is really the case or not. Jetty will to a proper close only if it has completed a HTTP message. Otherwise the close may be more brutal and may result in a RST (not directly under the control of java).
          Hide
          ASF subversion and git services added a comment -

          Commit 1724450 from Mark Miller in branch 'dev/trunk'
          [ https://svn.apache.org/r1724450 ]

          SOLR-8453: Solr should attempt to consume the request inputstream on errors as we cannot count on the container to do it.

          Show
          ASF subversion and git services added a comment - Commit 1724450 from Mark Miller in branch 'dev/trunk' [ https://svn.apache.org/r1724450 ] SOLR-8453 : Solr should attempt to consume the request inputstream on errors as we cannot count on the container to do it.
          Hide
          ASF subversion and git services added a comment -

          Commit 1724457 from Mark Miller in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1724457 ]

          SOLR-8453: Solr should attempt to consume the request inputstream on errors as we cannot count on the container to do it.

          Show
          ASF subversion and git services added a comment - Commit 1724457 from Mark Miller in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1724457 ] SOLR-8453 : Solr should attempt to consume the request inputstream on errors as we cannot count on the container to do it.
          Hide
          ASF subversion and git services added a comment -

          Commit 1724529 from Mark Miller in branch 'dev/trunk'
          [ https://svn.apache.org/r1724529 ]

          SOLR-8453: Only consume input here if exp != null, otherwise it is done in writeResponse.

          Show
          ASF subversion and git services added a comment - Commit 1724529 from Mark Miller in branch 'dev/trunk' [ https://svn.apache.org/r1724529 ] SOLR-8453 : Only consume input here if exp != null, otherwise it is done in writeResponse.
          Hide
          ASF subversion and git services added a comment -

          Commit 1724530 from Mark Miller in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1724530 ]

          SOLR-8453: Only consume input here if exp != null, otherwise it is done in writeResponse.

          Show
          ASF subversion and git services added a comment - Commit 1724530 from Mark Miller in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1724530 ] SOLR-8453 : Only consume input here if exp != null, otherwise it is done in writeResponse.
          Hide
          Mark Miller added a comment -

          Seems only doing this on errors is not enough. I'm looking at upgrading Jetty again and still see a bunch of connection resets now unless I make sure to always fully consume the stream.

          Show
          Mark Miller added a comment - Seems only doing this on errors is not enough. I'm looking at upgrading Jetty again and still see a bunch of connection resets now unless I make sure to always fully consume the stream.
          Hide
          Greg Wilkins added a comment -

          If solr is in the habit of not consuming the content AND is particular about how such connections are closed/reused, then I believe a filter should be applied to all requests.
          I will have a look again to see if jetty can provide consumeAll behaviour as an option, but I really think that is just making solr dependent on a container features... as it was when it assumed the container would consume all of the input on it's behalf - which is behaviour not covered by any spec I know of.

          Other than that, you could make a feature request on the client to be able to handle request sending errors separately to response processing. Ie in this case may receive a valid response and the request body send still failed.

          Show
          Greg Wilkins added a comment - If solr is in the habit of not consuming the content AND is particular about how such connections are closed/reused, then I believe a filter should be applied to all requests. I will have a look again to see if jetty can provide consumeAll behaviour as an option, but I really think that is just making solr dependent on a container features... as it was when it assumed the container would consume all of the input on it's behalf - which is behaviour not covered by any spec I know of. Other than that, you could make a feature request on the client to be able to handle request sending errors separately to response processing. Ie in this case may receive a valid response and the request body send still failed.
          Hide
          Mark Miller added a comment -

          Yeah, we are doing it in a filter, but we were just doing it on failures. I suspect some other code changes have exposed that we really should be doing it every request, but I've made that change. Currently, we don't really want the client or server to give up early in any case if we can help it. Since this issue is released, I filed a new JIRA for it, SOLR-8683 Always consume the full request on the server, not just in the case of an error.

          Show
          Mark Miller added a comment - Yeah, we are doing it in a filter, but we were just doing it on failures. I suspect some other code changes have exposed that we really should be doing it every request, but I've made that change. Currently, we don't really want the client or server to give up early in any case if we can help it. Since this issue is released, I filed a new JIRA for it, SOLR-8683 Always consume the full request on the server, not just in the case of an error.

            People

            • Assignee:
              Mark Miller
              Reporter:
              Mark Miller
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development