--- performance.xml Wed Jan 12 23:43:52 2005 +++ perf.xml Wed Jan 12 23:43:41 2005 @@ -1,102 +1,95 @@ - - HttpClient Performance Optimization Guide Oleg Kalnichevski $Id$ - -
-

- Per default HttpClient is configured to provide maximum reliability and HTTP standards +

+ By default HttpClient is configured to provide maximum reliability and standards compliance rather than raw performance. There are several configuration options and - optimization techniques, which can significantly improve performance of HttpClient. + optimization techniques which can significantly improve the performance of HttpClient. + This document outlines various techniques to achieve maximum HttpClient performance.

-

- There are also several anti-patterns that should be avoided to achieve best results - using HttpClient. -

-
- -
-

- One of the most common and, unfortunately, detrimental anti-patterns is an excessive - instantiation and disposal of HttpClient instances. In the most extreme case a new - instance of HttpClient is created per each HTTP request. This kind of use pattern is - strongly discouraged because of the excessive and unnecessary garbage collection involved. - When an instance of HttpClient goes out if scope and is marked for garbage collection, - usually along with it go out of scope all the parameters, the default HTTP state, cookies, - user credentials, the connection manager and most importantly HTTP connections, some of - which may still be open. In the worst case scenario before the garbage collection kicks - in there may be hundreds of open sockets leading to serious resource problems. -

-

- Generally it is recommended to have just a single instance of HttpClient per communication +

+

+ Generally it is recommended to have a single instance of HttpClient per communication component or even per application. However, if the application makes use of HttpClient - only very infrequently and keeping an idle instance of HttpClient in memory is not warranted, - it is highly recommended to explicitly + only very infrequently, and keeping an idle instance of HttpClient in memory is not warranted, + it is highly recommended to explicitly shut down the multithreaded connection manager prior to disposing - the HttpClient instance, which will ensure proper closure of all HTTP connections in the + the HttpClient instance. This will ensure proper closure of all HTTP connections in the connection pool.

-

- HttpClient always makes its best efforts to reuse connections. The connection - persistence is always on per default and requires no configuration. If the connection - persistence for some reason needs to be disabled, the best way to achieve that is to - provide a custom connection manager or extend the existing one and force-close connections +

+ HttpClient always does its best to reuse connections. Connection persistence is enabled + by default and requires no configuration. Under some situations this can lead to leaked + connections and therefore lost resources. The easiest way to disable connection persistence + is to provide or extend a connection manager that force-closes connections upon release in the releaseConnection method.

-

- If the application logic allows for execution of multiple HTTP requests concurrently, - for instance, multiple requests against different sites, or multiple requests representing - different user identities, the use of a dedicated thread per HTTP session can result in a +

+ If the application logic allows for execution of multiple HTTP requests concurrently + (e.g. multiple requests against various sites, or multiple requests representing + different user identities), the use of a dedicated thread per HTTP session can result in a significant performance gain. HttpClient is fully thread-safe when used with a thread-safe - connection manager such as + connection manager such as MultiThreadedHttpConnectionManager. Please note that each respective thread of execution must have a local instance of HttpMethod and can have a local instance of HttpState or/and HostConfiguration to represent a specific host configuration and conversational state. At the - same time HttpClient instance should be shared by all threads for maximum efficiency. + same time the HttpClient instance and connection manager should be shared among all threads + for maximum efficiency.

-

- For details on using multiple threads with HttpClient please to the +

+ For details on using multiple threads with HttpClient please refer to the HttpClient Threading Guide.

-

- HttpClient is capable of efficient request/response body streaming. Large entities can be submitted - or received without having to be buffered in memory. This is especially critical if multiple HTTP - methods may be executed concurrently. In this case the use of strings or byte arrays to provide or - consume request/response body may severely affect scalability or even cause out of memory condition. -

-

- Response streaming: It is recommended to consume the HTTP response body as a stream of - characters using HttpMethod#getResponseBodyAsStream method. The use of HttpMethod#getResponseBody and - HttpMethod#getResponseBodyAsString is strongly discouraged. These methods will be deprecated in the future - release of HttpClient. +

+ HttpClient is capable of efficient request/response body streaming. Large entities may be submitted + or received without being buffered in memory. This is especially critical if multiple HTTP + methods may be executed concurrently. While there are convenience methods to deal with entities such as + strings or byte arrays, their use is discouraged. Unless used carefully they can easily lead to + out of memory conditions, since they imply buffering of the complete entity in memory. +

+

+ Response streaming: It is recommended to consume the HTTP response body as a stream of + bytes/characters using HttpMethod#getResponseBodyAsStream method. The use of HttpMethod#getResponseBody and + HttpMethod#getResponseBodyAsString are strongly discouraged. -

-

- Request streaming: Main difficulty one may encounter when streaming request bodies is that - sometimes entity enclosing methods need to be retried due to an authentication failure or an I/O failure. +

+

+ Request streaming: The main difficulty encountered when streaming request bodies is that + some entity enclosing methods need to be retried due to an authentication failure or an I/O failure. Obviously non-buffered entities cannot be reread and resubmitted. The recommended approach is to create a custom RequestEntity capable of reconstructing the underlying input stream. @@ -154,40 +147,46 @@ File myfile = new File("myfile.txt"); PostMethod httppost = new PostMethod("/stuff"); httppost.setRequestEntity(new FileRequestEntity(myfile));]]> -

+

-

- The purpose of the 100 (Continue) status is to allow a client that is sending a request message with - a request body to determine if the origin server is willing to accept the request (based on the - request headers) before the client sends the request body. It may be highly inefficient for the client - to send the request body if the server will reject the request without looking at the body. +

+ The purpose of the HTTP 100 (Continue) status is to allow a client sending a request entity to + determine if the target server is willing to accept the request (based on the + request headers) before the client sends the request entity. It is highly inefficient for the client + to send the request entity if the server will reject the request without looking at the body. Authentication failures are the most common reason for the request to be rejected based on the request - headers alone. Therefore, the use of 'Expect-continue' handshake is especially recommended with - those target servers that require HTTP authentication. However, for proxied requests caution - must be exercised as older HTTP/1.0 proxies may be unable to correctly handle the 'Expect-continue' + headers alone. Therefore, use of the 'Expect-continue' handshake is especially recommended with + those target servers that require HTTP authentication. For proxied requests caution + must be taken as older HTTP/1.0 proxies may be unable to correctly handle the 'Expect-continue' handshake.

+

+ See the http.protocol.expect-continue parameter documentation + for more information. +

-

- HTTP specification permits both the client and the server to terminate the persistent (kept alive) - connection at any time without a notice to the counterpart, thus rendering the connection invalid, - or stale. Per default prior to executing a request HttpClient performs a check to determine if the - active connection is stale. The cost of this operation is about 15-30 ms depending on JRE used. +

+ HTTP specification permits both the client and the server to terminate a persistent (keep-alive) + connection at any time without notice to the counterpart, thus rendering the connection invalid + or stale. By default HttpClient performs a check, just prior to executing a request, to determine if the + active connection is stale. The cost of this operation is about 15-30 ms, depending on the JRE used. Disabling stale connection check may result in slight performance improvement, especially for small payload responses, at the risk of getting an I/O error when executing a request over a connection that has been closed at the server side.

+

+ See the http.connection.stalecheck parameter documentation for more + information. +

-

- If the application such as web spider does not need to maintain conversational state with the state - with the target server, a small performance gain can made by disabling cookie processing. For details +

+ If an application, such as web spider, does not need to maintain conversational state with the target + server, a small performance gain can made by disabling cookie processing. For details on cookie processing please to the HttpClient Cookies Guide.

- -