Description
I believe we have found an infrequent race condition in the Http Client library and can reproduce it about every few thousand requests. We are using version 5.1.3. What I believe is happening is that the request processing begins on the IO threads before the execute call has finished configuring its data structures to manage the request, causing an errant call to AsyncResponseConsumer<T>::fail. In this situation, the response headers have arrived and dispatched through a call to AsyncResponseConsumer<T>::consumeResponse, and then the execute thread gets to the final line in HttpAsyncMainClientExec::execute:
operation.setDependency(execRuntime.execute(exchangeId, internalExchangeHandler, clientContext));
Here, operation is a ComplexFuture object that has already been completed so it ends up cancelling the dependency:
@Override
public void setDependency(final Cancellable dependency) {
Args.notNull(dependency, "dependency");
if (isDone()) {
dependency.cancel();
} else {
dependencyRef.set(dependency);
{{ }}}
}
Here's the call stack:
org.apache.hc.client5.http.impl.async.HttpAsyncMainClientExec$1.cancel(HttpAsyncMainClientExec.java:129)
org.apache.hc.client5.http.impl.async.InternalHttpAsyncExecRuntime$3.cancel(InternalHttpAsyncExecRuntime.java:267)
org.apache.hc.core5.concurrent.ComplexFuture.setDependency(ComplexFuture.java:55)
org.apache.hc.client5.http.impl.async.HttpAsyncMainClientExec.execute(HttpAsyncMainClientExec.java:250)
org.apache.hc.client5.http.impl.async.AsyncExecChainElement.execute(AsyncExecChainElement.java:54)
org.apache.hc.client5.http.impl.async.AsyncExecChainElement$1.proceed(AsyncExecChainElement.java:62)
org.apache.hc.client5.http.impl.async.AsyncConnectExec$1.completed(AsyncConnectExec.java:142)
org.apache.hc.client5.http.impl.async.AsyncConnectExec$1.completed(AsyncConnectExec.java:136)
org.apache.hc.client5.http.impl.async.InternalHttpAsyncExecRuntime$1.completed(InternalHttpAsyncExecRuntime.java:114)
org.apache.hc.client5.http.impl.async.InternalHttpAsyncExecRuntime$1.completed(InternalHttpAsyncExecRuntime.java:105)
org.apache.hc.core5.concurrent.BasicFuture.completed(BasicFuture.java:123)
org.apache.hc.client5.http.impl.nio.PoolingAsyncClientConnectionManager$1$1.leaseCompleted(PoolingAsyncClientConnectionManager.java:285)
org.apache.hc.client5.http.impl.nio.PoolingAsyncClientConnectionManager$1$1.completed(PoolingAsyncClientConnectionManager.java:270)
org.apache.hc.client5.http.impl.nio.PoolingAsyncClientConnectionManager$1$1.completed(PoolingAsyncClientConnectionManager.java:234)
org.apache.hc.core5.concurrent.BasicFuture.completed(BasicFuture.java:123)
org.apache.hc.core5.pool.StrictConnPool.fireCallbacks(StrictConnPool.java:399)
org.apache.hc.core5.pool.StrictConnPool.lease(StrictConnPool.java:215)
org.apache.hc.client5.http.impl.nio.PoolingAsyncClientConnectionManager$1.<init>(PoolingAsyncClientConnectionManager.java:231)
org.apache.hc.client5.http.impl.nio.PoolingAsyncClientConnectionManager.lease(PoolingAsyncClientConnectionManager.java:227)
org.apache.hc.client5.http.impl.async.InternalHttpAsyncExecRuntime.acquireEndpoint(InternalHttpAsyncExecRuntime.java:100)
org.apache.hc.client5.http.impl.async.AsyncConnectExec.execute(AsyncConnectExec.java:135)
org.apache.hc.client5.http.impl.async.AsyncExecChainElement.execute(AsyncExecChainElement.java:54)
org.apache.hc.client5.http.impl.async.AsyncExecChainElement$1.proceed(AsyncExecChainElement.java:62)
org.apache.hc.client5.http.impl.async.AsyncProtocolExec.internalExecute(AsyncProtocolExec.java:183)
org.apache.hc.client5.http.impl.async.AsyncProtocolExec.execute(AsyncProtocolExec.java:145)
org.apache.hc.client5.http.impl.async.AsyncExecChainElement.execute(AsyncExecChainElement.java:54)
org.apache.hc.client5.http.impl.async.AsyncExecChainElement$1.proceed(AsyncExecChainElement.java:62)
org.apache.hc.client5.http.impl.async.AsyncHttpRequestRetryExec.internalExecute(AsyncHttpRequestRetryExec.java:97)
org.apache.hc.client5.http.impl.async.AsyncHttpRequestRetryExec.execute(AsyncHttpRequestRetryExec.java:183)
org.apache.hc.client5.http.impl.async.AsyncExecChainElement.execute(AsyncExecChainElement.java:54)
org.apache.hc.client5.http.impl.async.AsyncExecChainElement$1.proceed(AsyncExecChainElement.java:62)
org.apache.hc.client5.http.impl.async.AsyncRedirectExec.internalExecute(AsyncRedirectExec.java:112)
org.apache.hc.client5.http.impl.async.AsyncRedirectExec.execute(AsyncRedirectExec.java:278)
org.apache.hc.client5.http.impl.async.AsyncExecChainElement.execute(AsyncExecChainElement.java:54)
org.apache.hc.client5.http.impl.async.InternalAbstractHttpAsyncClient.executeImmediate(InternalAbstractHttpAsyncClient.java:367)
org.apache.hc.client5.http.impl.async.InternalAbstractHttpAsyncClient$1.sendRequest(InternalAbstractHttpAsyncClient.java:223)
org.apache.hc.core5.http.nio.support.BasicRequestProducer.sendRequest(BasicRequestProducer.java:93)
org.apache.hc.client5.http.impl.async.InternalAbstractHttpAsyncClient.doExecute(InternalAbstractHttpAsyncClient.java:180)
org.apache.hc.client5.http.impl.async.CloseableHttpAsyncClient.execute(CloseableHttpAsyncClient.java:97)
org.apache.hc.client5.http.impl.async.CloseableHttpAsyncClient.execute(CloseableHttpAsyncClient.java:107)
Back to HttpAsyncMainClientExec::execute:
operation.setDependency(execRuntime.execute(exchangeId, internalExchangeHandler, clientContext));
This calls InternalHttpAsyncExecRuntime::execute which calls endpoint.execute(id, exchangeHandler, context); which I believe begins the asynchronous processing of the HTTP exchange.{{ InternalHttpAsyncExecRuntime::execute}} then returns the Cancellable to be set in the ComplexFuture.
This only happens when there is already an available connection to lease because the the sequence above happens on the thread that calls execute; when no connection is available, one is created and the setup of the exchange then also happens on the HTTP dispatch thread so processing of bytes will not happen until after the setup is complete (since they both must happen on the dispatch thread).
I have two possible ideas for solutions:
Always do the setup on the HTTP dispatch thread. This seems like the safest option, but I'm not sure if that will impact anything else.
The other idea is to configure the dependency callback first before letting the HTTP exchange begin. This may solve this particular race condition but still leaves it possible for other similar race conditions.
I or one of my colleagues may be able to implement a fix but we would need guidance on how best to solve this.
I have a simple unit test that exhibits this issue that I'll post as a comment below.