HttpComponents HttpClient
  1. HttpComponents HttpClient
  2. HTTPCLIENT-427

Implement a cache to perform real request only when needed

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.1 Alpha2
    • Component/s: HttpClient
    • Labels:
      None
    • Environment:
      Operating System: Linux
      Platform: PC

      Description

      Browsers may cache received content according to the values of different
      response headers. It would be great if HttpClient could do the same.

      1. caching_client.zip
        80 kB
        David Mays
      2. httpclient-cache.zip
        106 kB
        Oleg Kalnichevski
      3. HTTPCLIENT-427-immutable-cachenentry.patch
        52 kB
        Oleg Kalnichevski
      4. HTTPCLIENT-427-Immutable-CacheEntry2.patch
        133 kB
        David Mays
      5. HTTPCLIENT-427-rfc2616-sec13-tests.patch
        25 kB
        Jonathan Moore
      6. docbook-caching-tutorial.patch
        6 kB
        David Mays

        Activity

        Hide
        Oleg Kalnichevski added a comment -

        I think it is about right. Patch checked in. Many thanks, Dave

        Oleg

        Show
        Oleg Kalnichevski added a comment - I think it is about right. Patch checked in. Many thanks, Dave Oleg
        Hide
        David Mays added a comment -

        Attached is a patch that contains a new docbook chapter about the Caching Client.

        Let me know if you would like it to be more or less than it is, or modify as you see fit.

        Dave

        Show
        David Mays added a comment - Attached is a patch that contains a new docbook chapter about the Caching Client. Let me know if you would like it to be more or less than it is, or modify as you see fit. Dave
        Hide
        David Mays added a comment -

        I'm working on an additional chapter for the documentation, titled "HTTP Caching" which will include general background on the caching code, its effort at being compliant with the caching requirements in RFC-2616 and a trivial example of how one might configure it.

        I should be ready to send a patch with that sometime today.

        Dave

        Show
        David Mays added a comment - I'm working on an additional chapter for the documentation, titled "HTTP Caching" which will include general background on the caching code, its effort at being compliant with the caching requirements in RFC-2616 and a trivial example of how one might configure it. I should be ready to send a patch with that sometime today. Dave
        Hide
        Oleg Kalnichevski added a comment -

        @Jonathan

        Many thanks for contributing the patch. Committed to SVN trunk

        @all

        I think things are in a shape good enough for an ALPHA release.

        I raised a new JIRA for the cache entry representation issue, see HTTPCLIENT-937.

        @Jonathan, Joe and Dave

        A short tutorial section about caching support would make things absolutely perfect

        Oleg

        Show
        Oleg Kalnichevski added a comment - @Jonathan Many thanks for contributing the patch. Committed to SVN trunk @all I think things are in a shape good enough for an ALPHA release. I raised a new JIRA for the cache entry representation issue, see HTTPCLIENT-937 . @Jonathan, Joe and Dave A short tutorial section about caching support would make things absolutely perfect Oleg
        Hide
        Joe Campbell added a comment -

        @Sam:

        RE #1) Thank you for the more in depth explanation of volatile keywords use in this case. This clarified the use of the keyword for me quite a bit and has made my understanding of what that keyword 'really' does in code more precise.

        RE #2) I think Jon covered that explanation. I think all here (that have worked on the Caching client) agree that we would like to head down the path that Oleg suggested - but we don't want that work to gate getting the 'working foundation' of this code checked in and released. We are happy to continue to work with the group to make suggested changes to improve and strengthen this code base contribution.

        Thanks,
        Joe

        Show
        Joe Campbell added a comment - @Sam: RE #1) Thank you for the more in depth explanation of volatile keywords use in this case. This clarified the use of the keyword for me quite a bit and has made my understanding of what that keyword 'really' does in code more precise. RE #2) I think Jon covered that explanation. I think all here (that have worked on the Caching client) agree that we would like to head down the path that Oleg suggested - but we don't want that work to gate getting the 'working foundation' of this code checked in and released. We are happy to continue to work with the group to make suggested changes to improve and strengthen this code base contribution. Thanks, Joe
        Hide
        Jonathan Moore added a comment -

        Please find attached a patch that adds some more acceptance tests to TestProtocolRequirements, as gleaned from Section 13 of RFC 2616, in terms of MUST and MUST NOT requirements. Existing code already passed these, so this is a net addition of passing unit tests.

        This contribution is made with the consent of my employer, Comcast Interactive Media, and license is granted to the ASF for inclusion in ASF works.

        Show
        Jonathan Moore added a comment - Please find attached a patch that adds some more acceptance tests to TestProtocolRequirements, as gleaned from Section 13 of RFC 2616, in terms of MUST and MUST NOT requirements. Existing code already passed these, so this is a net addition of passing unit tests. This contribution is made with the consent of my employer, Comcast Interactive Media, and license is granted to the ASF for inclusion in ASF works.
        Hide
        Jonathan Moore added a comment -

        @Sam:

        The byte[] is indeed the cached response body, and I would agree with Seb that as-is, it's not threadsafe because someone can muck with the bytearray. I think clone() on the byte array solves this, but at the expense of extra copies of the data. The Right Way is probably going down the road that Oleg suggests, which is to use something like HttpEntity instead which can be treated as immutable.

        With regard to the potential size of the response, the CachingHttpClient lets you specify a max object size (with a heuristic default behavior that I'd have to recheck in the code); if the CachingHttpClient encounters a response that exceeds this, it won't try to cache it and will just stream the result through (it will optimistically consume some of the input stream, in case there is a chunked-encoding response that will fit, but if this process exceeds the max length then it will return a response with a "composite" input stream that returns the byte array read so far, then "appends" the rest of the unconsumed origin response stream.

        Show
        Jonathan Moore added a comment - @Sam: The byte[] is indeed the cached response body, and I would agree with Seb that as-is, it's not threadsafe because someone can muck with the bytearray. I think clone() on the byte array solves this, but at the expense of extra copies of the data. The Right Way is probably going down the road that Oleg suggests, which is to use something like HttpEntity instead which can be treated as immutable. With regard to the potential size of the response, the CachingHttpClient lets you specify a max object size (with a heuristic default behavior that I'd have to recheck in the code); if the CachingHttpClient encounters a response that exceeds this, it won't try to cache it and will just stream the result through (it will optimistically consume some of the input stream, in case there is a chunked-encoding response that will fit, but if this process exceeds the max length then it will return a response with a "composite" input stream that returns the byte array read so far, then "appends" the rest of the unconsumed origin response stream.
        Hide
        Sam Berlin added a comment -

        One statement, one question:

        1) Without looking at the code involved, the AtomicLong variable would only need to be volatile if the variable itself were being assigned to a new AtomicLong in one thread and used in another. If the AtomicLong object is created & assigned in one place (preferably the constructor, allowing the object to be final), then calling 'set' or 'get' on it is effectively the same as calling get or set on a volatile long. If you say 'new AtomicLong' somewhere other than the constructor, something is probably wrong with the code.

        2) Re: the byte[].clone() that sebb added. Again I haven't looked too deeply at the code (just the diffs), but is that the byte[] of the body of a cached response? If so, that could potentially be very large. Cloning it could lead to memory crunches. Is it possible to change the objects involved to ByteBuffers & convert it using asReadOnly? Additionally, if it is a byte[] of the body of cached response... I'm concerned that larger responses (ie, downloading a large file) wouldn't even be able to be stored in memory. Does the API allow for reading the data from an InputStream (or Channel?). (If that byte[] is not for the body of cached responses, you can ignore this question.)

        Show
        Sam Berlin added a comment - One statement, one question: 1) Without looking at the code involved, the AtomicLong variable would only need to be volatile if the variable itself were being assigned to a new AtomicLong in one thread and used in another. If the AtomicLong object is created & assigned in one place (preferably the constructor, allowing the object to be final), then calling 'set' or 'get' on it is effectively the same as calling get or set on a volatile long. If you say 'new AtomicLong' somewhere other than the constructor, something is probably wrong with the code. 2) Re: the byte[].clone() that sebb added. Again I haven't looked too deeply at the code (just the diffs), but is that the byte[] of the body of a cached response? If so, that could potentially be very large. Cloning it could lead to memory crunches. Is it possible to change the objects involved to ByteBuffers & convert it using asReadOnly? Additionally, if it is a byte[] of the body of cached response... I'm concerned that larger responses (ie, downloading a large file) wouldn't even be able to be stored in memory. Does the API allow for reading the data from an InputStream (or Channel?). (If that byte[] is not for the body of cached responses, you can ignore this question.)
        Hide
        Sebb added a comment -

        @Joe:

        From the Javadoc for java.util.concurrent.atomic:

        1. get has the memory effects of reading a volatile variable.
        2. set has the memory effects of writing (assigning) a volatile variable.

        The way I read that, there is no need to use volatile.
        If there were, then AtomicLong would not be thread-safe.

        I already changed CacheInvalidator to @ThreadSafe rather than @Immutable.

        Show
        Sebb added a comment - @Joe: From the Javadoc for java.util.concurrent.atomic: get has the memory effects of reading a volatile variable. set has the memory effects of writing (assigning) a volatile variable. The way I read that, there is no need to use volatile. If there were, then AtomicLong would not be thread-safe. I already changed CacheInvalidator to @ThreadSafe rather than @Immutable.
        Hide
        Joe Campbell added a comment -

        Sebastian,
        The volatile keyword effects the multi-thread visibility of the value located in the AtomicLong. To be appropriate, the volatile keyword should remain. From the util.concurrent java doc:

        "In essence, the classes in this package extend the notion of volatile values, fields, and array elements to those that also provide an atomic conditional update operation of the form..."

        volatility and atomicity are not the same by this notation, would you be ok with putting that keyword back?

        Also - I am curious about the @Immutable annotation in reference to the CacheInvalidator... by your definition CacheInvalidator is not immutable because a member variable of the invalidator can change internally (meaning things can be added and removed from the cache) - I am unclear though why this 'cache' mutability effects the mutability of the Invalidator class. Once the Cache is set on the CacheInvalidator it can only be changed if the cache is reconstructed/replaced (i.e. a new instance of the CacheInvalidator is needed to change the cache it uses).

        Thanks,
        Joe

        Show
        Joe Campbell added a comment - Sebastian, The volatile keyword effects the multi-thread visibility of the value located in the AtomicLong. To be appropriate, the volatile keyword should remain. From the util.concurrent java doc: "In essence, the classes in this package extend the notion of volatile values, fields, and array elements to those that also provide an atomic conditional update operation of the form..." volatility and atomicity are not the same by this notation, would you be ok with putting that keyword back? Also - I am curious about the @Immutable annotation in reference to the CacheInvalidator... by your definition CacheInvalidator is not immutable because a member variable of the invalidator can change internally (meaning things can be added and removed from the cache) - I am unclear though why this 'cache' mutability effects the mutability of the Invalidator class. Once the Cache is set on the CacheInvalidator it can only be changed if the cache is reconstructed/replaced (i.e. a new instance of the CacheInvalidator is needed to change the cache it uses). Thanks, Joe
        Hide
        Sebb added a comment -

        CacheEntity is not inherently thread-safe, because external classes can affect the byte array in non-threadsafe manner.
        It is only threadsafe if the external classes behave themselves.
        However, cloning the body array means that the class is now @Immutable.

        Other issues also fixed.

        Show
        Sebb added a comment - CacheEntity is not inherently thread-safe, because external classes can affect the byte array in non-threadsafe manner. It is only threadsafe if the external classes behave themselves. However, cloning the body array means that the class is now @Immutable. Other issues also fixed.
        Hide
        Oleg Kalnichevski added a comment -

        Sebastian,

        I would like that byte array be eventually replaced with a more complex object similar to HttpEntity. While CacheEntity is not truly immutable, it is thread safe now. This is good enough for the first cut and should not be a release blocker. Feel free to remove @Immutable annotations which you find inappropriate. Please just go ahead and make whatever changes you deem necessary to close the issue and proceed with the release.

        Oleg

        Show
        Oleg Kalnichevski added a comment - Sebastian, I would like that byte array be eventually replaced with a more complex object similar to HttpEntity. While CacheEntity is not truly immutable, it is thread safe now. This is good enough for the first cut and should not be a release blocker. Feel free to remove @Immutable annotations which you find inappropriate. Please just go ahead and make whatever changes you deem necessary to close the issue and proceed with the release. Oleg
        Hide
        Sebb added a comment -

        I think there is still a minor issue with the CacheEntry class.

        It contains the field:

        private final byte[] body;

        However, this is exposed by both the constructor and getBody(), neither of which copy the array.

        As it stands, the class is not @Immutable, because the body array can be changed externally.

        Simplest solution is to copy the array.

        Also, the CacheInvalidator class is definitely not @Immutable, because the cache can be upated.
        It is @ThreadSafe, so long as the cache implementation is @ThreadSafe

        The AtomicLong fields in CachingHttpClient don't need to be volatile; also I think the class is now @ThreadSafe, so long as the responseCache is.

        Show
        Sebb added a comment - I think there is still a minor issue with the CacheEntry class. It contains the field: private final byte[] body; However, this is exposed by both the constructor and getBody(), neither of which copy the array. As it stands, the class is not @Immutable, because the body array can be changed externally. Simplest solution is to copy the array. Also, the CacheInvalidator class is definitely not @Immutable, because the cache can be upated. It is @ThreadSafe, so long as the cache implementation is @ThreadSafe The AtomicLong fields in CachingHttpClient don't need to be volatile; also I think the class is now @ThreadSafe, so long as the responseCache is.
        Hide
        Oleg Kalnichevski added a comment -

        Patch checked in. Many thanks, David

        As far as I can tell all major issues have been addressed. I'll start cutting preview packages sometime this weekend.

        Oleg

        Show
        Oleg Kalnichevski added a comment - Patch checked in. Many thanks, David As far as I can tell all major issues have been addressed. I'll start cutting preview packages sometime this weekend. Oleg
        Hide
        David Mays added a comment -

        This patch correctly makes CacheEntry immutable, and fixes all the unit tests that broke based on this change.

        Show
        David Mays added a comment - This patch correctly makes CacheEntry immutable, and fixes all the unit tests that broke based on this change.
        Hide
        Joe Campbell added a comment -

        Thanks Oleg.

        I think that Dave was heading down the very same path that you were (attempting to make CacheEntry immutable). We'll take a look at your patch and incorporate as needed. I don't want to speak for Dave (as he was the one making the change) but we should have a patch in the next day, TWO at the most.

        Also if you have specific questions about EasyMock - don't hesitate to send a direct email to Dave or myself. Happy to help the learning and information dissemination of a wonderful framework for mocking. 8-)

        Thanks,
        Joe

        Show
        Joe Campbell added a comment - Thanks Oleg. I think that Dave was heading down the very same path that you were (attempting to make CacheEntry immutable). We'll take a look at your patch and incorporate as needed. I don't want to speak for Dave (as he was the one making the change) but we should have a patch in the next day, TWO at the most. Also if you have specific questions about EasyMock - don't hesitate to send a direct email to Dave or myself. Happy to help the learning and information dissemination of a wonderful framework for mocking. 8-) Thanks, Joe
        Hide
        Oleg Kalnichevski added a comment -

        Dave, Joe

        I could not fully complete the changes that I had in mind. So far I managed to have made all variables in the CacheEntry class final which helps a great deal to ensure thread safety of the class, but I was unable to make it truly immutable. Variant URIs can still be mutated by multiple threads leading to a race condition. I have also failed miserably trying to wrap my head around EasyMock which turned out to be far from easy. One of the test cases still fails after my changes and I was not able to fix it nor to fully understand why it is failing in the first place. I think it is better if you take over from here and incorporate the patch (attached) into your change set.

        Cheers

        Oleg

        Show
        Oleg Kalnichevski added a comment - Dave, Joe I could not fully complete the changes that I had in mind. So far I managed to have made all variables in the CacheEntry class final which helps a great deal to ensure thread safety of the class, but I was unable to make it truly immutable. Variant URIs can still be mutated by multiple threads leading to a race condition. I have also failed miserably trying to wrap my head around EasyMock which turned out to be far from easy. One of the test cases still fails after my changes and I was not able to fix it nor to fully understand why it is failing in the first place. I think it is better if you take over from here and incorporate the patch (attached) into your change set. Cheers Oleg
        Hide
        Joe Campbell added a comment -

        Yes - they can be made private and final - as they are never meant to change. In fact, I added the @Immutable annotation to the class (similar to other classes assumed to have been modified by Oleg). OptionsHttp11Response's purpose is to serve as the response container for a request OPTIONS query with maxforwards set to 0 (essentially a loopback to indicate all the various things that the cache will support).

        This change will be included in a patch from us when Oleg is complete with making CacheEntry immutable as well. We are trying not to collide changes by pausing to wait for Oleg's changes on the tree and then supplying a patch for some of the other minor changes requested based on that.

        Thanks,
        Joe

        Show
        Joe Campbell added a comment - Yes - they can be made private and final - as they are never meant to change. In fact, I added the @Immutable annotation to the class (similar to other classes assumed to have been modified by Oleg). OptionsHttp11Response's purpose is to serve as the response container for a request OPTIONS query with maxforwards set to 0 (essentially a loopback to indicate all the various things that the cache will support). This change will be included in a patch from us when Oleg is complete with making CacheEntry immutable as well. We are trying not to collide changes by pausing to wait for Oleg's changes on the tree and then supplying a patch for some of the other minor changes requested based on that. Thanks, Joe
        Hide
        Sebb added a comment -

        For OptionsHttp11Response - is there any need for the instance variables statusLine and version to be mutable? Can they be made private?
        I know fixing this is not strictly necessary, but then again is it necessary for the fields to be mutable and not private?

        It's a lot better to start with all fields private - and final if possible - and only increase the visibility if strictly necessary.

        Show
        Sebb added a comment - For OptionsHttp11Response - is there any need for the instance variables statusLine and version to be mutable? Can they be made private? I know fixing this is not strictly necessary, but then again is it necessary for the fields to be mutable and not private? It's a lot better to start with all fields private - and final if possible - and only increase the visibility if strictly necessary.
        Hide
        David Mays added a comment -

        The idea behind Variant URIs is that for a resource that has Vary response headers, we store a "parent" cache entry, which has pointers (variant URIs) to "child" entries. The child entries will contain the "variable" content for that resource. So the parent entry becomes the map to find a given child entry, where the URI (cache key) has had a string like

        {Content-Encoding:gzip}

        prepended to it

        So you might have a parent entry like this:

        "http://www.example.com/foo"

        With children like these:

        "

        {Content-Encoding:gzip}

        http://www.example.com/foo" (Caused by a request that had Accept-Encoding:gzip header)
        "

        {Content-Encoding:identity}

        http://www.example.com/foo" (Caused by a request that had Accept-Encoding:identity header)

        The idea of keeping the list of variant children with the parent entry is so that we can clean up all child entries when a parent is invalidated. It prevents orphan cache entries.

        Regarding the atomic updates, we had also started working on a patch for that, where CacheEntry had a new constructor that took everything you need, instead of the HttpResponse. Obviously all the other setters were to be removed also. This way, every time a cache entry is "pulled" from the cache, it is effectively a copy, rather than a reference.

        The CacheEntryUpdater class would have to be changed slightly to work with this, but would not be especially difficult to do.

        Dave

        Show
        David Mays added a comment - The idea behind Variant URIs is that for a resource that has Vary response headers, we store a "parent" cache entry, which has pointers (variant URIs) to "child" entries. The child entries will contain the "variable" content for that resource. So the parent entry becomes the map to find a given child entry, where the URI (cache key) has had a string like {Content-Encoding:gzip} prepended to it So you might have a parent entry like this: "http://www.example.com/foo" With children like these: " {Content-Encoding:gzip} http://www.example.com/foo " (Caused by a request that had Accept-Encoding:gzip header) " {Content-Encoding:identity} http://www.example.com/foo " (Caused by a request that had Accept-Encoding:identity header) The idea of keeping the list of variant children with the parent entry is so that we can clean up all child entries when a parent is invalidated. It prevents orphan cache entries. Regarding the atomic updates, we had also started working on a patch for that, where CacheEntry had a new constructor that took everything you need, instead of the HttpResponse. Obviously all the other setters were to be removed also. This way, every time a cache entry is "pulled" from the cache, it is effectively a copy, rather than a reference. The CacheEntryUpdater class would have to be changed slightly to work with this, but would not be especially difficult to do. Dave
        Hide
        Oleg Kalnichevski added a comment -

        Dave,

        Only threading issue is a show stopper. All others can be addressed at a later point and should not block the release.

        In my opinion the best way to ensure CacheEntity thread safety is by making it immutable. Basically instead of updating an exiting entry the cache should replace an existing entry with a new one thus making synchronisation on that entity unnecessary. This solves threading issue without decreasing performance.

        I already have started working on a patch but was unable to fix 5 test cases that currently fail. I will do my best to finish my changes tomorrow and pass it onto you for review.

        I have stumbled upon one particular bit which I cannot fully understand without your help. What is the intent of variantURIs set? I could not figure out just by looking at the test cases.

        Oleg

        Show
        Oleg Kalnichevski added a comment - Dave, Only threading issue is a show stopper. All others can be addressed at a later point and should not block the release. In my opinion the best way to ensure CacheEntity thread safety is by making it immutable. Basically instead of updating an exiting entry the cache should replace an existing entry with a new one thus making synchronisation on that entity unnecessary. This solves threading issue without decreasing performance. I already have started working on a patch but was unable to fix 5 test cases that currently fail. I will do my best to finish my changes tomorrow and pass it onto you for review. I have stumbled upon one particular bit which I cannot fully understand without your help. What is the intent of variantURIs set? I could not figure out just by looking at the test cases. Oleg
        Hide
        David Mays added a comment -

        For the SizeLimitedResponseReader and OptionsHttp11Response, these are "new" for every request, and so should not be accessed by multiple threads unless someone was doing something very unusual in their client code.

        The counters in CachingHttpClient are being updated to AtomicLong.

        The tricky bit is the CacheEntry.

        I'm thinking that the atomicity there should be in the updates to the cache, rather than the entry itself. We already have this notion with the HttpCacheUpdateCallback, which enables atomic updates to the cache. The key is preventing existing instances of a CacheEntry from being updated while someone is reading one. I believe we have a handle on how to make this happen, so we are working on it now.

        What do you consider the top priority items to complete before you consider this ready for release? Specifically, is the CacheEntry physical representation something you consider gating for release, or can that go as-is for now?

        Dave

        Show
        David Mays added a comment - For the SizeLimitedResponseReader and OptionsHttp11Response, these are "new" for every request, and so should not be accessed by multiple threads unless someone was doing something very unusual in their client code. The counters in CachingHttpClient are being updated to AtomicLong. The tricky bit is the CacheEntry. I'm thinking that the atomicity there should be in the updates to the cache, rather than the entry itself. We already have this notion with the HttpCacheUpdateCallback, which enables atomic updates to the cache. The key is preventing existing instances of a CacheEntry from being updated while someone is reading one. I believe we have a handle on how to make this happen, so we are working on it now. What do you consider the top priority items to complete before you consider this ready for release? Specifically, is the CacheEntry physical representation something you consider gating for release, or can that go as-is for now? Dave
        Hide
        Oleg Kalnichevski added a comment -

        I think all these classes need to be thread-safe, as the cache may be accessed from multiple worker threads concurrently.

        Oleg

        Show
        Oleg Kalnichevski added a comment - I think all these classes need to be thread-safe, as the cache may be accessed from multiple worker threads concurrently. Oleg
        Hide
        Sebb added a comment -

        I think one or two classes are not currently thread-safe:

        CacheEntry
        CachingHttpClient (only the counts are a possible problem)
        OptionsHttp11Response - not sure why the instance fields are not final & private?
        SizeLimitedResponseReader

        Are these classes intended to be shared between threads?

        Show
        Sebb added a comment - I think one or two classes are not currently thread-safe: CacheEntry CachingHttpClient (only the counts are a possible problem) OptionsHttp11Response - not sure why the instance fields are not final & private? SizeLimitedResponseReader Are these classes intended to be shared between threads?
        Hide
        Sebb added a comment -

        The volatile long count fields in CachingHttpClient are not guaranteed to be updated correctly by multiple threads.

        This is because:

        • increment is not atomic
        • long read/write are not guaranteed atomic (though they probably are on most modern hardware)

        The first could cause lost updates, the second could cause rubbish values to be stored.

        Could either use AtomicLong, or could synchronize updates.

        Show
        Sebb added a comment - The volatile long count fields in CachingHttpClient are not guaranteed to be updated correctly by multiple threads. This is because: increment is not atomic long read/write are not guaranteed atomic (though they probably are on most modern hardware) The first could cause lost updates, the second could cause rubbish values to be stored. Could either use AtomicLong, or could synchronize updates.
        Hide
        Oleg Kalnichevski added a comment -

        Dave,

        Module checked in. Many thanks you all you, guys, for contributing it!

        I forgot to mention another minor issue. One of the tests TestCachingHttpClient#testRealResultsMatch depends on an external resource. If possible, consider replacing it with a mock service based on LocalTestServer. You can use TestRedirects test cases as a starting point.

        http://svn.apache.org/repos/asf/httpcomponents/httpclient/trunk/httpclient/src/test/java/org/apache/http/client/protocol/TestRedirects.java
        http://svn.apache.org/repos/asf/httpcomponents/httpclient/trunk/httpclient/src/test/java/org/apache/http/localserver/LocalTestServer.java

        As soon as the issues I mentioned previously have been looked into, I'll proceed with cutting an official release

        Cheers

        Oleg

        Show
        Oleg Kalnichevski added a comment - Dave, Module checked in. Many thanks you all you, guys, for contributing it! I forgot to mention another minor issue. One of the tests TestCachingHttpClient#testRealResultsMatch depends on an external resource. If possible, consider replacing it with a mock service based on LocalTestServer. You can use TestRedirects test cases as a starting point. http://svn.apache.org/repos/asf/httpcomponents/httpclient/trunk/httpclient/src/test/java/org/apache/http/client/protocol/TestRedirects.java http://svn.apache.org/repos/asf/httpcomponents/httpclient/trunk/httpclient/src/test/java/org/apache/http/localserver/LocalTestServer.java As soon as the issues I mentioned previously have been looked into, I'll proceed with cutting an official release Cheers Oleg
        Hide
        David Mays added a comment -

        Oleg,

        Sounds good. I like the direction you're going, separating it into Impl and interfaces.

        Thanks,

        Dave

        Show
        David Mays added a comment - Oleg, Sounds good. I like the direction you're going, separating it into Impl and interfaces. Thanks, Dave
        Hide
        Oleg Kalnichevski added a comment -

        Dave,

        Please find a slightly reworked version of your code attached to the issue. I took liberty of making a number of changes in order to make the code more consistent with the rest of HttpClient. I started splitting the code into API and impl parts and generified some public interfaces. I also converted all test cases to JUnit 4 (by the way the test coverage in truly impressive).

        If I hear no objections, I'll commit the code tonight (~20:00 GMT)

        Oleg

        Show
        Oleg Kalnichevski added a comment - Dave, Please find a slightly reworked version of your code attached to the issue. I took liberty of making a number of changes in order to make the code more consistent with the rest of HttpClient. I started splitting the code into API and impl parts and generified some public interfaces. I also converted all test cases to JUnit 4 (by the way the test coverage in truly impressive). If I hear no objections, I'll commit the code tonight (~20:00 GMT) Oleg
        Hide
        David Mays added a comment -

        Oleg,

        That's outstanding! Congratulations on the addition to your family.

        I certainly understand if it isn't committed very soon.

        Dave

        Show
        David Mays added a comment - Oleg, That's outstanding! Congratulations on the addition to your family. I certainly understand if it isn't committed very soon. Dave
        Hide
        Oleg Kalnichevski added a comment -

        Dave,

        My wife just gave birth to our first child. Things are a bit hectic right now. I'll try to commit the code tomorrow.

        Oleg

        Show
        Oleg Kalnichevski added a comment - Dave, My wife just gave birth to our first child. Things are a bit hectic right now. I'll try to commit the code tomorrow. Oleg
        Hide
        David Mays added a comment -

        Oleg,

        Thanks for letting me know! I was actually planning to suggest this morning that you go ahead and commit the code if you are ready, so that I can check out the trunk and work on the suggestions you had.

        If you're not ready to commit it to the trunk yet, that is fine also; I can put off any local modifications for a while.

        Thanks,

        Dave

        Show
        David Mays added a comment - Oleg, Thanks for letting me know! I was actually planning to suggest this morning that you go ahead and commit the code if you are ready, so that I can check out the trunk and work on the suggestions you had. If you're not ready to commit it to the trunk yet, that is fine also; I can put off any local modifications for a while. Thanks, Dave
        Hide
        Oleg Kalnichevski added a comment -

        David,

        I started messing with your code. You probably should not make any major changes in your local code tree until the code is committed to the repository.

        Oleg

        Show
        Oleg Kalnichevski added a comment - David, I started messing with your code. You probably should not make any major changes in your local code tree until the code is committed to the repository. Oleg
        Hide
        Oleg Kalnichevski added a comment -

        > How do I go about contributing to the tutorial?

        Ideally, by submitting a patch. The latest tutorial snapshot in the DocBook format can be found here:

        http://svn.apache.org/repos/asf/httpcomponents/httpclient/trunk/src/docbkx/

        I took a very cursory look at the code. Overall, great stuff. Some minor remarks:

        (1) I spotted quite a few instance variables that could be made final. Final variables make code somewhat less prone to race conditions and generally make it easier to ensure thread-safety when accessed by multiple threads.

        (2) Static Logs are considered harmful in managed environments such as JEE containers. HttpClient uses non-static Log instances as recommended by the developers of Commons Logging. Cache should be consistent with the rest of the code base.

        (3) I see lots of places where CacheOperationExceptions are silently discarded. Each particular case should be reviewed and the exception should either be propagated, handled or logged. Exceptions should be ignored in very special cases only. In most cases logging at DEBUG or WARN priority is all that it takes.

        (4) Ideally CacheEntry class should not impose a particular physical representation of the cache entry content (for instance, as a byte array). We might also want to provide a file system based cache implementation, which ideally should function without buffering the entire response content in memory. I kind of think CacheEntry might even use HttpEntity interface for content representation.

        There are two ways we could proceed. (1) I can commit the patch pretty much as is and let you work on improvements incrementally by submitting smaller patches, or (2) you can resubmit the whole thing if that is easier for you.

        Cheers

        Oleg

        Show
        Oleg Kalnichevski added a comment - > How do I go about contributing to the tutorial? Ideally, by submitting a patch. The latest tutorial snapshot in the DocBook format can be found here: http://svn.apache.org/repos/asf/httpcomponents/httpclient/trunk/src/docbkx/ I took a very cursory look at the code. Overall, great stuff. Some minor remarks: (1) I spotted quite a few instance variables that could be made final. Final variables make code somewhat less prone to race conditions and generally make it easier to ensure thread-safety when accessed by multiple threads. (2) Static Logs are considered harmful in managed environments such as JEE containers. HttpClient uses non-static Log instances as recommended by the developers of Commons Logging. Cache should be consistent with the rest of the code base. (3) I see lots of places where CacheOperationExceptions are silently discarded. Each particular case should be reviewed and the exception should either be propagated, handled or logged. Exceptions should be ignored in very special cases only. In most cases logging at DEBUG or WARN priority is all that it takes. (4) Ideally CacheEntry class should not impose a particular physical representation of the cache entry content (for instance, as a byte array). We might also want to provide a file system based cache implementation, which ideally should function without buffering the entire response content in memory. I kind of think CacheEntry might even use HttpEntity interface for content representation. There are two ways we could proceed. (1) I can commit the patch pretty much as is and let you work on improvements incrementally by submitting smaller patches, or (2) you can resubmit the whole thing if that is easier for you. Cheers Oleg
        Hide
        David Mays added a comment -

        This file supercedes my previous submission of cachingclient.tgz. This one actually works, and has more passing tests.

        I am submitting this and granting the software to the ASF, under the terms of the ASL, with full consent of my employer, Comcast Corporation.

        Show
        David Mays added a comment - This file supercedes my previous submission of cachingclient.tgz. This one actually works, and has more passing tests. I am submitting this and granting the software to the ASF, under the terms of the ASL, with full consent of my employer, Comcast Corporation.
        Hide
        David Mays added a comment -

        Oops. I didn't even consider the JUnit version issue when submitting this.

        I believe we'll be ready with a non-broken submission later today. One of our guys spent some time this weekend fixing that, as well as making some other new unit tests pass. He's down to the last failing test, so I think we're in good shape.

        How do I go about contributing to the tutorial?

        Show
        David Mays added a comment - Oops. I didn't even consider the JUnit version issue when submitting this. I believe we'll be ready with a non-broken submission later today. One of our guys spent some time this weekend fixing that, as well as making some other new unit tests pass. He's down to the last failing test, so I think we're in good shape. How do I go about contributing to the tutorial?
        Hide
        Sebb added a comment - - edited

        JUnit4 is backwards compatible with JUnit3, and HttpClient requires Java 1.5+ so the dependency for HttpClient could just be changed to 4.8.1. (4.8.2 has been released, but is not yet in Maven or on SourceForge)

        It's probably not a good idea to mix JUnit3 and 4 syntax in the same class, however

        Show
        Sebb added a comment - - edited JUnit4 is backwards compatible with JUnit3, and HttpClient requires Java 1.5+ so the dependency for HttpClient could just be changed to 4.8.1. (4.8.2 has been released, but is not yet in Maven or on SourceForge) It's probably not a good idea to mix JUnit3 and 4 syntax in the same class, however
        Hide
        Oleg Kalnichevski added a comment -

        Dave,

        No worries. Even if it is broken it might be better if I check in the code sooner than later, so others could start reviewing the API. However, it looks like I will have to migrate HttpClient from JUnit 3 to JUnit 4 before your code could be committed.

        Another thing. I'd be really great if you could contribute a short section on the caching extension to the HttpClient tutorial.

        Cheers

        Oleg

        Show
        Oleg Kalnichevski added a comment - Dave, No worries. Even if it is broken it might be better if I check in the code sooner than later, so others could start reviewing the API. However, it looks like I will have to migrate HttpClient from JUnit 3 to JUnit 4 before your code could be committed. Another thing. I'd be really great if you could contribute a short section on the caching extension to the HttpClient tutorial. Cheers Oleg
        Hide
        David Mays added a comment -

        Oleg,

        We discovered that something seems to be fatally broken right now, in the sense that it is not caching anything. Please hold off integrating this for now. My apologies for the false start.

        I'll get back to you about it on Monday. It's likely a very small bug, since we did at one time have certainty that it WAS working correctly.

        Thanks,

        Dave

        Show
        David Mays added a comment - Oleg, We discovered that something seems to be fatally broken right now, in the sense that it is not caching anything. Please hold off integrating this for now. My apologies for the false start. I'll get back to you about it on Monday. It's likely a very small bug, since we did at one time have certainty that it WAS working correctly. Thanks, Dave
        Hide
        Oleg Kalnichevski added a comment -

        I will review and commit the patch either today or tomorrow.

        Oleg

        Show
        Oleg Kalnichevski added a comment - I will review and commit the patch either today or tomorrow. Oleg
        Hide
        David Mays added a comment -

        You are correct. I did not receive your message. However you should now be able to see the attachment I have just uploaded.

        Dave

        Show
        David Mays added a comment - You are correct. I did not receive your message. However you should now be able to see the attachment I have just uploaded. Dave
        Hide
        David Mays added a comment -

        This patch is being submitted by David Mays on behalf of Comcast Corporation. I am acting with the consent of my employer.

        It is intended for redistribution and is granted to the Apache Software Foundation under the terms of the Apache Software License.

        Contributors to this patch:
        Joe Campbell
        David Cleaver
        David Mays
        Jon Moore
        Brad Spenla

        Show
        David Mays added a comment - This patch is being submitted by David Mays on behalf of Comcast Corporation. I am acting with the consent of my employer. It is intended for redistribution and is granted to the Apache Software Foundation under the terms of the Apache Software License. Contributors to this patch: Joe Campbell David Cleaver David Mays Jon Moore Brad Spenla
        Hide
        Oleg Kalnichevski added a comment -

        Hhhm. I guess you did not get my response to your message for some reason. I simply cannot commit code sent to me directly. You MUST attach the patch to this issue and clearly state that the code is granted to the ASF (Apache Software Foundation) under the terms of ASL (Apache Software License) and that you are acting with a consent of your employer.

        Oleg

        Show
        Oleg Kalnichevski added a comment - Hhhm. I guess you did not get my response to your message for some reason. I simply cannot commit code sent to me directly. You MUST attach the patch to this issue and clearly state that the code is granted to the ASF (Apache Software Foundation) under the terms of ASL (Apache Software License) and that you are acting with a consent of your employer. Oleg
        Hide
        David Mays added a comment -

        Oleg,

        I've sent you a tgz of the source code for the caching client. Let me know if you need anything or have any questions.

        Thanks,

        Dave

        Show
        David Mays added a comment - Oleg, I've sent you a tgz of the source code for the caching client. Let me know if you need anything or have any questions. Thanks, Dave
        Hide
        Oleg Kalnichevski added a comment -

        Usually one should always be using trunk when submitting new features

        http://svn.apache.org/repos/asf/httpcomponents/httpclient/trunk/

        In this particular case it should not really matter, though, as the code should be self-contained and should not require any changes to the existing classes.

        Either a patch or a bunch of ZIPPED classes should do just fine.

        Oleg

        Show
        Oleg Kalnichevski added a comment - Usually one should always be using trunk when submitting new features http://svn.apache.org/repos/asf/httpcomponents/httpclient/trunk/ In this particular case it should not really matter, though, as the code should be self-contained and should not require any changes to the existing classes. Either a patch or a bunch of ZIPPED classes should do just fine. Oleg
        Hide
        David Mays added a comment -

        Hi, I'm working with Jonathan on the caching client. Am I correct in assuming that I should create the patch against the 4.1-alpha1 tag in SVN?

        Show
        David Mays added a comment - Hi, I'm working with Jonathan on the caching client. Am I correct in assuming that I should create the patch against the 4.1-alpha1 tag in SVN?
        Hide
        Oleg Kalnichevski added a comment -

        There is no deadline, but due to personal circumstances I would like to get the next release out within the next two weeks.

        Oleg

        Show
        Oleg Kalnichevski added a comment - There is no deadline, but due to personal circumstances I would like to get the next release out within the next two weeks. Oleg
        Hide
        Jonathan Moore added a comment -

        Ok, I think we have something ready to be poked at by a larger community. We're getting the signoff from the legal team (shouldn't be a problem), and it should be pretty easy to produce the patch from what we have going in our internal repository. Let us know what your release deadline would be and we'll try to hit it.

        Show
        Jonathan Moore added a comment - Ok, I think we have something ready to be poked at by a larger community. We're getting the signoff from the legal team (shouldn't be a problem), and it should be pretty easy to produce the patch from what we have going in our internal repository. Let us know what your release deadline would be and we'll try to hit it.
        Hide
        Oleg Kalnichevski added a comment -

        +1 to what Odi said.

        BTW, I intend to start working on the 4.1-a2 release in a few days. If you think your code is ready for testing by a larger community of users I could hold back the release for a week or so.

        Oleg

        Show
        Oleg Kalnichevski added a comment - +1 to what Odi said. BTW, I intend to start working on the 4.1-a2 release in a few days. If you think your code is ready for testing by a larger community of users I could hold back the release for a week or so. Oleg
        Hide
        Ortwin Glück added a comment -

        Jon,

        You are not implementing a proxy! Only a cache. A proxy is an intermediate server that the client is fully aware of. The client uses the CONNECT method and proxy-specific headers to interact with a proxy. A cache however is fully transparent to the client. The client has no knowledge about the proxy. No Via headers. No Max-Forwards interpretation. A cache only interacts with the headers that control lifetime of the cached object. Apart from that it just replays responses to the client. That implies that you have to store also the original headers and not just the request body. In my opinion the cache should even issue the original Date and Server headers back to the client and not generate new ones.

        Show
        Ortwin Glück added a comment - Jon, You are not implementing a proxy! Only a cache. A proxy is an intermediate server that the client is fully aware of. The client uses the CONNECT method and proxy-specific headers to interact with a proxy. A cache however is fully transparent to the client. The client has no knowledge about the proxy. No Via headers. No Max-Forwards interpretation. A cache only interacts with the headers that control lifetime of the cached object. Apart from that it just replays responses to the client. That implies that you have to store also the original headers and not just the request body. In my opinion the cache should even issue the original Date and Server headers back to the client and not generate new ones.
        Hide
        Jonathan Moore added a comment -

        Just wanted to touch base here on progress, to keep the thread alive. We've got basic functionality working, and are working our way through the RFC to create unit tests around all the MUST/MUST NOT protocol requirements. In the meantime, here's a couple more behavior questions:

        Options and Max-Forwards: in theory, if a proxy receives an OPTIONS request with a Max-Fowards: 0, it is not supposed to forward the request and instead should consider the OPTIONS request to be directed to it directly. Should the CachingHttpClient consider itself a proxy in this sense (i.e. count itself as a hop in the request/response chain)? Again, this is the "am I just part of the client or am I a proxy" question.

        Via header: probably related in spirit. Should the CachingHttpClient add Via headers as a proxy? i.e.

        Via: 1.0 localhost (CachingHttpClient/4.0.1)

        Show
        Jonathan Moore added a comment - Just wanted to touch base here on progress, to keep the thread alive. We've got basic functionality working, and are working our way through the RFC to create unit tests around all the MUST/MUST NOT protocol requirements. In the meantime, here's a couple more behavior questions: Options and Max-Forwards: in theory, if a proxy receives an OPTIONS request with a Max-Fowards: 0, it is not supposed to forward the request and instead should consider the OPTIONS request to be directed to it directly. Should the CachingHttpClient consider itself a proxy in this sense (i.e. count itself as a hop in the request/response chain)? Again, this is the "am I just part of the client or am I a proxy" question. Via header: probably related in spirit. Should the CachingHttpClient add Via headers as a proxy? i.e. Via: 1.0 localhost (CachingHttpClient/4.0.1)
        Hide
        Ortwin Glück added a comment -

        Jon, I like your approach.

        If you don't know what to do with a request, the best thing you can do is to pass it on verbatim to the server, as would do the client without a cache. The server will know the answer. If you can't decide whether it's correct to cache the response or not, then don't. Better make an extra request to the server than deliver stale or incorrect data (especially with WebDav you can easily loose data).

        My opinion on the case in question: The correct behaviour of a compliant server is undefined, especially no meaning must be implied to the returned status code or headers. Trying to be as transparent as possible should be the way to go here (trying not to break anything). So pass on the request verbatim to the destination and remove the resource from the cache, as you can't know what the server will do.

        Show
        Ortwin Glück added a comment - Jon, I like your approach. If you don't know what to do with a request, the best thing you can do is to pass it on verbatim to the server, as would do the client without a cache. The server will know the answer. If you can't decide whether it's correct to cache the response or not, then don't. Better make an extra request to the server than deliver stale or incorrect data (especially with WebDav you can easily loose data). My opinion on the case in question: The correct behaviour of a compliant server is undefined, especially no meaning must be implied to the returned status code or headers. Trying to be as transparent as possible should be the way to go here (trying not to break anything). So pass on the request verbatim to the destination and remove the resource from the cache, as you can't know what the server will do.
        Hide
        Jonathan Moore added a comment -

        We're taking the approach of trying to be a conditionally-compliant (meaning, we implement all the MUSTs and MUST NOTs in the RFC) HTTP/1.1 transparent proxy cache, even though technically the CachingHttpClient decorator isn't a full proxy (since it doesn't actually talk HTTP with its upstream client; we use the HttpClient API). In general, we feel like this will maximize interoperability but could lead to some surprising behavior.

        One example requirement of the RFC is: "Clients MAY issue simple (non-subrange) GET requests with either weak validators or strong validators. Clients MUST NOT use weak validators in other forms of request."

        Let's say the client program issues a request like this:

        PUT / HTTP/1.1
        Host: foo.example.com
        Content-Length: 6
        Content-Type: text/plain
        If-Match: W/"etag"

        stuff

        If I were a compliant, standalone proxy, I would not be allowed to send this request downstream per the RFC, and would probably return a 400 (Bad Request). Now, the CachingHttpClient could likewise construct a 400 response and hand it right back, but this might be surprising to a client who had previously been interacting with a non-compliant origin that allowed this, in which case the CachingHttpClient wouldn't really be a drop-in replacement.

        Is there some community opinion on how to behave in cases like this? Is subjecting ourselves to the rigors of requirements on proxies too much? Should this type of behavior be configurable?

        Show
        Jonathan Moore added a comment - We're taking the approach of trying to be a conditionally-compliant (meaning, we implement all the MUSTs and MUST NOTs in the RFC) HTTP/1.1 transparent proxy cache, even though technically the CachingHttpClient decorator isn't a full proxy (since it doesn't actually talk HTTP with its upstream client; we use the HttpClient API). In general, we feel like this will maximize interoperability but could lead to some surprising behavior. One example requirement of the RFC is: "Clients MAY issue simple (non-subrange) GET requests with either weak validators or strong validators. Clients MUST NOT use weak validators in other forms of request." Let's say the client program issues a request like this: PUT / HTTP/1.1 Host: foo.example.com Content-Length: 6 Content-Type: text/plain If-Match: W/"etag" stuff If I were a compliant, standalone proxy, I would not be allowed to send this request downstream per the RFC, and would probably return a 400 (Bad Request). Now, the CachingHttpClient could likewise construct a 400 response and hand it right back, but this might be surprising to a client who had previously been interacting with a non-compliant origin that allowed this, in which case the CachingHttpClient wouldn't really be a drop-in replacement. Is there some community opinion on how to behave in cases like this? Is subjecting ourselves to the rigors of requirements on proxies too much? Should this type of behavior be configurable?
        Hide
        Jonathan Moore added a comment -

        Ok, great. It may take us a week or two to get it all tied up, but shouldn't be a problem.

        Jon

        Show
        Jonathan Moore added a comment - Ok, great. It may take us a week or two to get it all tied up, but shouldn't be a problem. Jon
        Hide
        Oleg Kalnichevski added a comment -

        Jon,

        Simply attach the source (preferably in a form of a patch) to this issue and do not forget to select 'Grant license to ASF for inclusion in ASF works'.

        Oleg

        Show
        Oleg Kalnichevski added a comment - Jon, Simply attach the source (preferably in a form of a patch) to this issue and do not forget to select 'Grant license to ASF for inclusion in ASF works'. Oleg
        Hide
        Jonathan Moore added a comment -

        Hi folks,

        We had a group of engineers work on a CachingHttpClient as a "lab week" project last week, following (as it turns out) the same decorator pattern Oleg outlined. We've also developed a pluggable cache storage interface and have in-memory (basically a LRU HashMap), ehcache, and memcache backends. We're not quite standards-compliant yet, but hopefully will be soon.

        Currently this builds as a separate JAR with all the classes under org.apache.http.client.cache as a new package (the adapters for the ehcache and memcache backends are also separate JARs so you can just include the dependencies you need).

        We'd like to contribute this back to the community; are there any particulars about the best way to do this?

        Thanks,
        Jon Moore
        Comcast Interactive Media

        Show
        Jonathan Moore added a comment - Hi folks, We had a group of engineers work on a CachingHttpClient as a "lab week" project last week, following (as it turns out) the same decorator pattern Oleg outlined. We've also developed a pluggable cache storage interface and have in-memory (basically a LRU HashMap), ehcache, and memcache backends. We're not quite standards-compliant yet, but hopefully will be soon. Currently this builds as a separate JAR with all the classes under org.apache.http.client.cache as a new package (the adapters for the ehcache and memcache backends are also separate JARs so you can just include the dependencies you need). We'd like to contribute this back to the community; are there any particulars about the best way to do this? Thanks, Jon Moore Comcast Interactive Media
        Hide
        Oleg Kalnichevski added a comment -

        > For your #3, shouldn't caching be tightly coupled with HttpClient rather than being implemented as a decorator?

        I am pretty sure it should not. Moreover, I think a good caching implementation should require no changes in the DefaultHttpClient at all.

        Oleg

        Show
        Oleg Kalnichevski added a comment - > For your #3, shouldn't caching be tightly coupled with HttpClient rather than being implemented as a decorator? I am pretty sure it should not. Moreover, I think a good caching implementation should require no changes in the DefaultHttpClient at all. Oleg
        Hide
        Avlesh Singh added a comment -

        Thanks Oleg, these would be good for me to get started.
        For your #3, shouldn't caching be tightly coupled with HttpClient rather than being implemented as a decorator? More so, because, caching can only be implemented if the response headers direct an HttpClient instance to do so. Having said this, I definitely agree to the fact that actual cache should be an API implementation thereby giving users the flexibility to use a custom cache.

        I'll come up with a basic cache API definition and share shortly.

        Show
        Avlesh Singh added a comment - Thanks Oleg, these would be good for me to get started. For your #3, shouldn't caching be tightly coupled with HttpClient rather than being implemented as a decorator? More so, because, caching can only be implemented if the response headers direct an HttpClient instance to do so. Having said this, I definitely agree to the fact that actual cache should be an API implementation thereby giving users the flexibility to use a custom cache. I'll come up with a basic cache API definition and share shortly.
        Hide
        Oleg Kalnichevski added a comment -

        Avlesh,

        This is how I personally think this issue should be addressed :

        (1) Firstly, one needs to define an abstract API for storing and retrieving content from an abstract cache. A single interface HttpCache (or some such) is likely to be sufficient.

        (2) Different implementations of the said caching API should be possible. The default one can be very simple just backed by a HashMap instance. For a more sophisticated implementation I would recommend using ehcache [1]

        (3) The caching logic should be implemented as a decorator for the HttpClient interface [2]. It should be possible to take any arbitrary instance of HttpClient and enhance it with caching capabilities.

        Oleg

        [1] http://www.ehcache.org/
        [2] http://hc.apache.org/httpcomponents-client/httpclient/apidocs/org/apache/http/client/HttpClient.html

        Show
        Oleg Kalnichevski added a comment - Avlesh, This is how I personally think this issue should be addressed : (1) Firstly, one needs to define an abstract API for storing and retrieving content from an abstract cache. A single interface HttpCache (or some such) is likely to be sufficient. (2) Different implementations of the said caching API should be possible. The default one can be very simple just backed by a HashMap instance. For a more sophisticated implementation I would recommend using ehcache [1] (3) The caching logic should be implemented as a decorator for the HttpClient interface [2] . It should be possible to take any arbitrary instance of HttpClient and enhance it with caching capabilities. Oleg [1] http://www.ehcache.org/ [2] http://hc.apache.org/httpcomponents-client/httpclient/apidocs/org/apache/http/client/HttpClient.html
        Hide
        Avlesh Singh added a comment -

        I agree to Mike's comment. Caching would be a boon for all rest based API's.
        I am ready to take on this issue and come up with a prototype. Can someone help me on where exactly to get started with? Is it HttpMethodBase?

        Would common's cache (http://commons.apache.org/sandbox/cache/) be good enough to start with, or something as complex as JSC (http://jakarta.apache.org/jcs/) is needed?

        Show
        Avlesh Singh added a comment - I agree to Mike's comment. Caching would be a boon for all rest based API's. I am ready to take on this issue and come up with a prototype. Can someone help me on where exactly to get started with? Is it HttpMethodBase? Would common's cache ( http://commons.apache.org/sandbox/cache/ ) be good enough to start with, or something as complex as JSC ( http://jakarta.apache.org/jcs/ ) is needed?
        Hide
        Mike Youngstrom added a comment -

        This would be extremely useful in the case of restful service calls. If http client could emulate a browsers caching logic and then expose an api that that will allow developers to plug this logic into their system's caching solution that would be awesome!!!!! I've looked around and couldn't find any other http library that offers such functionality. Client side caching could mark a big reason for frameworks such as CXF and Spring RestTemplate (provides Commons Http Client) to add Http Components as an optional client.

        This isn't really a case of HttpClient trying to be a browser. It's a case for HttpClient to taking advantage of standard http headers to provide a well understood caching model.

        Show
        Mike Youngstrom added a comment - This would be extremely useful in the case of restful service calls. If http client could emulate a browsers caching logic and then expose an api that that will allow developers to plug this logic into their system's caching solution that would be awesome!!!!! I've looked around and couldn't find any other http library that offers such functionality. Client side caching could mark a big reason for frameworks such as CXF and Spring RestTemplate (provides Commons Http Client) to add Http Components as an optional client. This isn't really a case of HttpClient trying to be a browser. It's a case for HttpClient to taking advantage of standard http headers to provide a well understood caching model.
        Hide
        Sebb added a comment -

        This would be useful for JMeter, though it would probably not want to store the full pages, only enough to identify them and the relevant headers.

        So it would be useful if the logic to determine cacheability etc was implemented independently of the store.

        Show
        Sebb added a comment - This would be useful for JMeter, though it would probably not want to store the full pages, only enough to identify them and the relevant headers. So it would be useful if the logic to determine cacheability etc was implemented independently of the store.
        Hide
        Roland Weber added a comment -

        make that version 5

        Show
        Roland Weber added a comment - make that version 5
        Hide
        Oleg Kalnichevski added a comment -

        HttpClient has never been meant to be a browser, however we MAY consider
        providing a lightweight caching API without a concrete implementation in version 4.0

        Oleg

        Show
        Oleg Kalnichevski added a comment - HttpClient has never been meant to be a browser, however we MAY consider providing a lightweight caching API without a concrete implementation in version 4.0 Oleg

          People

          • Assignee:
            Unassigned
            Reporter:
            Marc Guillemot
          • Votes:
            4 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development