Uploaded image for project: 'HttpComponents HttpCore'
  1. HttpComponents HttpCore
  2. HTTPCORE-609

Null Pointer Exception because of race condition in DefaultConnectingIOReactor

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 4.4.12
    • Fix Version/s: 4.4.13
    • Component/s: HttpCore NIO
    • Labels:
      None

      Description

      There is a race condition because of which a null pointer exception happens rendering the IOReactor break.

       

      [ERROR] [] 2019-10-16 11:27:31.904 [pool-2-thread-1] InternalHttpAsyncClient - I/O reactor terminated abnormally[ERROR] [] 2019-10-16 11:27:31.904 [pool-2-thread-1] InternalHttpAsyncClient - I/O reactor terminated abnormallyjava.lang.NullPointerException: null at org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processEvent(DefaultConnectingIOReactor.java:170) ~[AdExchange%23%23stable_1.32.23.44.jar:?] at org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processEvents(DefaultConnectingIOReactor.java:148) ~[AdExchange%23%23stable_1.32.23.44.jar:?] at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor.execute(AbstractMultiworkerIOReactor.java:351) ~[AdExchange%23%23stable_1.32.23.44.jar:?] at org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager.execute(PoolingNHttpClientConnectionManager.java:221) ~[AdExchange%23%23stable_1.32.23.44.jar:?] at org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase$1.run(CloseableHttpAsyncClientBase.java:64) ~[AdExchange%23%23stable_1.32.23.44.jar:?] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_141]
      

       

       

      This is a tough race condition to reproduce, I did it by using debuggers. This was introducted by https://github.com/apache/httpcomponents-core/commit/aa812282f26fdd1975233a892c5405fa0da781b4#diff-d577e717cb1e97f3a3c0adbc8d563062

       

      I will try my best to explain the steps of what happens in this scenario:

      1. A session request is submitted to establish a connection to the ConnectingIOReactor.
      2. https://github.com/apache/httpcomponents-core/blob/e7b4f5cc306b3eb4b6e1e24627971049001ddd68/httpcore-nio/src/main/java/org/apache/http/impl/nio/reactor/DefaultConnectingIOReactor.java#L258-L260 Here it is checked if the request is already completed or not. If not then a socket is assigned and there after the request takes place.
      3. The race condition happens when after the second step and between the assignment of the selection key https://github.com/apache/httpcomponents-core/blob/e7b4f5cc306b3eb4b6e1e24627971049001ddd68/httpcore-nio/src/main/java/org/apache/http/impl/nio/reactor/DefaultConnectingIOReactor.java#L316-L318 the request is cancelled.
      4. In this case from https://github.com/apache/httpcomponents-core/blob/e7b4f5cc306b3eb4b6e1e24627971049001ddd68/httpcore-nio/src/main/java/org/apache/http/impl/nio/reactor/SessionRequestImpl.java#L208-L227 since the key has not yet being assigned, there is nothing to cancel, the request is cancelled successuly and the interested parties are informed as well via callbacks.
      5. When  https://github.com/apache/httpcomponents-core/blob/e7b4f5cc306b3eb4b6e1e24627971049001ddd68/httpcore-nio/src/main/java/org/apache/http/impl/nio/reactor/DefaultConnectingIOReactor.java#L316-L318 executes now, we have got a condition where we have a cancelled request with an active selection key.
      6. After the completion of processSessionRequests method, let us say we have also reached the expiry time of the request and in the same call we hit processTimeouts, where the current request is timedout. Since the request is already in completed state (via cancel) it won't be timedout but the attachement of the key will get an assignement null via the finally block.
      7. At this point, the request is still completed while the key is still not cancelled. The key now has the attachment null. When the key becomes ready, it will be passed to processEvent method where https://github.com/apache/httpcomponents-core/blob/e7b4f5cc306b3eb4b6e1e24627971049001ddd68/httpcore-nio/src/main/java/org/apache/http/impl/nio/reactor/DefaultConnectingIOReactor.java#L169-L170 will produce null pointer exception since the attachment is null.

      This race condition might seem like a rare scenario but under heavy load, this is not that difficult to reach. In those scenarios, the IOReactor breaks and no further request can be made via the client.

       

      I tried adding a test case for the same, but couldn't. I will let you know the params that I am using and the point where I had put the debug point if that helps.

       

      soTimeout -> 1000

      connectTimeout -> 1000

      soKeepAlive -> true

      tcpNoDelay -> true

      selectInterval -> 10

      soThreadCount -> 4

      soReuseAddress -> true

       

      DefaultRequestConfig

      socketTimeout -> 1000

      connectTimeout -> 1000

       

      Debug Point: https://github.com/apache/httpcomponents-core/blob/e7b4f5cc306b3eb4b6e1e24627971049001ddd68/httpcore-nio/src/main/java/org/apache/http/impl/nio/reactor/DefaultConnectingIOReactor.java#L263

       

      At this point I will call request.cancel and wait for 1-2 seconds before I move forward, and it reproduces the above scenario.

       

      A suggested solution is to check while assigning key to request if it is already completed then cancel the key and close the channel. We have been using this in production, and it resolves the above issue. I will also submit a corresponding PR for the same.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                altaiezior Anurag Agarwal
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 40m
                  40m