Thrift
  1. Thrift
  2. THRIFT-970

Under heavy load, THttpClient may fail with "too many open files"

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.2, 0.3, 0.4, 0.5
    • Fix Version/s: 0.6
    • Component/s: Java - Library
    • Labels:
      None
    • Environment:

      All

      Description

      THttpClient uses URL.openConnection which returns a HttpUrlConnection instance for each message transmission.

      HttpUrlConnection supposedly pools connections to the server. While stress testing an application, we've noticed that after several thousands requests, THttpClient would fail with a "Too many open files" error thrown in java.net.Socket.createImpl called from THttpClient.flush.

      As the underlying connection to the server is internally handled by HttpUrlConnection and the JVM, there is unfortunately not much that can be done to remedy this problem while still using HttpUrlConnection.

      I propose a new implementation of THttpClient which uses Apache HttpClient from Http Components instead of HttpUrlConnection.

      1. THRIFT-970-1
        8 kB
        Mathias Herberts
      2. THRIFT-970-2
        8 kB
        Mathias Herberts

        Activity

        Hide
        Mathias Herberts added a comment -

        This patch introduces the THttpTransport class which intends to be a drop in replacement for THttpClient (apart from the constructor which differs slightly).

        In order to quickly release resources associated with connections, response content is stored in a ByteArrayInputStream before being made available to the enclosing TProtocol.

        This means that the memory footprint might be double that of the response (the THttpTransport buffer and the TBase being filled).

        Show
        Mathias Herberts added a comment - This patch introduces the THttpTransport class which intends to be a drop in replacement for THttpClient (apart from the constructor which differs slightly). In order to quickly release resources associated with connections, response content is stored in a ByteArrayInputStream before being made available to the enclosing TProtocol. This means that the memory footprint might be double that of the response (the THttpTransport buffer and the TBase being filled).
        Hide
        Bryan Duxbury added a comment -

        I'm a little concerned about having two different HTTP-client-related classes in Thrift. On the one hand, it seems like THttpClient isn't all that responsible. On the other hand, it seems like your new THttpTransport won't cache connections at all, which could have significant performance implications.

        Couldn't we instead find some way to make THttpClient do what we want it to and let he pooling/caching be configurable?

        Also, it appears that just closing the input stream you get from HttpUrlConnection never closes the underlying socket, but there's another method called disconnect() that will close the underlying socket if the connection is idle. Maybe we just need to change THttpClient to call that method.

        Show
        Bryan Duxbury added a comment - I'm a little concerned about having two different HTTP-client-related classes in Thrift. On the one hand, it seems like THttpClient isn't all that responsible. On the other hand, it seems like your new THttpTransport won't cache connections at all, which could have significant performance implications. Couldn't we instead find some way to make THttpClient do what we want it to and let he pooling/caching be configurable? Also, it appears that just closing the input stream you get from HttpUrlConnection never closes the underlying socket, but there's another method called disconnect() that will close the underlying socket if the connection is idle. Maybe we just need to change THttpClient to call that method.
        Hide
        Mathias Herberts added a comment -

        My THttpTransport relies on HttpClient to do any caching/limiting, so connections will be cached if HttpClient is configured to do so.

        HttpClient has many other options, like limiting the number of connections per server, using a proxy, using various connection factories, etc.

        Show
        Mathias Herberts added a comment - My THttpTransport relies on HttpClient to do any caching/limiting, so connections will be cached if HttpClient is configured to do so. HttpClient has many other options, like limiting the number of connections per server, using a proxy, using various connection factories, etc.
        Hide
        Mathias Herberts added a comment -

        My tests show that if you use THttpTransport with a connection manager for which the following parameters were set

        http.protocol.version=HttpVersion.HTTP_1_1
        http.protocol.content-charset=UTF-8
        http.protocol.expect-continue=false
        http.connection.stalecheck=false

        then performance is 5 to 15% better than those of THttpClient.

        Show
        Mathias Herberts added a comment - My tests show that if you use THttpTransport with a connection manager for which the following parameters were set http.protocol.version=HttpVersion.HTTP_1_1 http.protocol.content-charset=UTF-8 http.protocol.expect-continue=false http.connection.stalecheck=false then performance is 5 to 15% better than those of THttpClient.
        Hide
        Mathias Herberts added a comment -

        I could merge THttpClient and THttpTransport by having the current THttpClient behavior if a null HttpClient is passed to the constructor, how is that?

        Show
        Mathias Herberts added a comment - I could merge THttpClient and THttpTransport by having the current THttpClient behavior if a null HttpClient is passed to the constructor, how is that?
        Hide
        Bryan Duxbury added a comment -

        I think that sounds like a good solution. The javadoc for the class should contain at least an inkling (or better yet, a link) to information on how to use it performantly.

        Show
        Bryan Duxbury added a comment - I think that sounds like a good solution. The javadoc for the class should contain at least an inkling (or better yet, a link) to information on how to use it performantly.
        Hide
        Mathias Herberts added a comment -

        Patch which modifies THttpClient to offer both implementations.

        When using a HttpClient instance, the user agent is set to 'Java/THttpClient/HC' instead of 'Java/THttpClient'.

        Show
        Mathias Herberts added a comment - Patch which modifies THttpClient to offer both implementations. When using a HttpClient instance, the user agent is set to 'Java/THttpClient/HC' instead of 'Java/THttpClient'.
        Hide
        Bryan Duxbury added a comment -

        I just committed this. Thanks for working through the features with me.

        Show
        Bryan Duxbury added a comment - I just committed this. Thanks for working through the features with me.
        Hide
        David Reiss added a comment -

        I just noticed the change to ivy.xml. Does this mean that You need httpcomponents installed even if you are only using the HttpURLConnection version?

        Show
        David Reiss added a comment - I just noticed the change to ivy.xml. Does this mean that You need httpcomponents installed even if you are only using the HttpURLConnection version?
        Hide
        Mathias Herberts added a comment -

        Indeed, should this pose a problem I can rollback the merging and add a THttpTransport.java file with the HttpClient backed implementation.

        Show
        Mathias Herberts added a comment - Indeed, should this pose a problem I can rollback the merging and add a THttpTransport.java file with the HttpClient backed implementation.
        Hide
        David Reiss added a comment -

        That's my preference, but Bryan is a better judge on Java issues than I am.

        Show
        David Reiss added a comment - That's my preference, but Bryan is a better judge on Java issues than I am.
        Hide
        Bryan Duxbury added a comment -

        I'm much more concerned with library coherence than reducing external dependencies. Having multiple HTTP clients seems like a great way to confuse our users and decrease our maintainability. Really, I don't think anyone should ever want to use the slower version anyway.

        Show
        Bryan Duxbury added a comment - I'm much more concerned with library coherence than reducing external dependencies. Having multiple HTTP clients seems like a great way to confuse our users and decrease our maintainability. Really, I don't think anyone should ever want to use the slower version anyway.

          People

          • Assignee:
            Mathias Herberts
            Reporter:
            Mathias Herberts
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development