Traffic Server
  1. Traffic Server
  2. TS-621

writing 0 bytes to the HTTP cache means only update the header... need a new API: update_header_only() to allow 0 byte files to be cached

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 2.1.5
    • Fix Version/s: 3.3.2
    • Component/s: Cache
    • Labels:
      None
    1. force_empty.diff
      5 kB
      weijin
    2. TS-621_cluster_zero_size_objects.patch
      2 kB
      Zhao Yongming
    3. ts-621-jp-3.patch
      17 kB
      John Plevyak
    4. ts-621-jp-2.patch
      16 kB
      John Plevyak
    5. ts-621-jp-1.patch
      14 kB
      John Plevyak

      Issue Links

        Activity

        Hide
        Leif Hedstrom added a comment -

        When we add this, we should also support 0-length bodies to be stored in cache (e.g. 301/302 responses).

        Show
        Leif Hedstrom added a comment - When we add this, we should also support 0-length bodies to be stored in cache (e.g. 301/302 responses).
        Hide
        Leif Hedstrom added a comment -

        John: Should we move this out to v2.1.7, or can we get this done for v2.1.6 ?

        Show
        Leif Hedstrom added a comment - John: Should we move this out to v2.1.7, or can we get this done for v2.1.6 ?
        Hide
        John Plevyak added a comment -

        What is the timeline? I am taking the kids to Disneyland next week...
        https://issues.apache.org/jira/browse/TS-621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel]
        v2.1.6 ?
        new API: update_header_only() to allow 0 byte files to be cached
        -----------------------------------------------------------------------------------------------------------------------------------------

        Show
        John Plevyak added a comment - What is the timeline? I am taking the kids to Disneyland next week... https://issues.apache.org/jira/browse/TS-621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] v2.1.6 ? new API: update_header_only() to allow 0 byte files to be cached -----------------------------------------------------------------------------------------------------------------------------------------
        Hide
        Leif Hedstrom added a comment -

        I'm thinking in 2 weeks we should start the v2.1.6 release ? I'll leave this for v2.1.6, if it doesn't get in, no worries, I don't think it's a critical fix.

        Have fun in Disneyland, wish I was going .

        Show
        Leif Hedstrom added a comment - I'm thinking in 2 weeks we should start the v2.1.6 release ? I'll leave this for v2.1.6, if it doesn't get in, no worries, I don't think it's a critical fix. Have fun in Disneyland, wish I was going .
        Hide
        Leif Hedstrom added a comment -

        Moving this out to v2.1.9 for now, in prep for the v2.1.8 release. Please move back if this will be fixed in the next few days.

        Show
        Leif Hedstrom added a comment - Moving this out to v2.1.9 for now, in prep for the v2.1.8 release. Please move back if this will be fixed in the next few days.
        Hide
        Leif Hedstrom added a comment -

        John: Is there any chance we'll get this for v2.1.9 in the near future? Or should we move this out for post v3.0? I know of at least one "customer" who would really want this feature (it's very useful to be able to cache 302/301 responses without a body).

        Show
        Leif Hedstrom added a comment - John: Is there any chance we'll get this for v2.1.9 in the near future? Or should we move this out for post v3.0? I know of at least one "customer" who would really want this feature (it's very useful to be able to cache 302/301 responses without a body).
        Hide
        John Plevyak added a comment -

        I'd like to get it in. The concern however, is that it changes the API
        which means
        that it will break clustering, so we have to get cluster changes/testing
        before committing.

        john

        Show
        John Plevyak added a comment - I'd like to get it in. The concern however, is that it changes the API which means that it will break clustering, so we have to get cluster changes/testing before committing. john
        Hide
        Leif Hedstrom added a comment -

        I can definitely help out with the clustering testing and changes necessary. I have a pair of VMs setup for clustering dev.

        Show
        Leif Hedstrom added a comment - I can definitely help out with the clustering testing and changes necessary. I have a pair of VMs setup for clustering dev.
        Hide
        John Plevyak added a comment -

        Initial version. Give it a shot.

        Show
        John Plevyak added a comment - Initial version. Give it a shot.
        Hide
        Leif Hedstrom added a comment -

        John: I tried this patch, and it doesn't seem to have any negative effects either in standalone or clustermode. However, zero-length body responses are still not cacheable. Do we also need to make some changes to the HttpSM for that to work ?

        Show
        Leif Hedstrom added a comment - John: I tried this patch, and it doesn't seem to have any negative effects either in standalone or clustermode. However, zero-length body responses are still not cacheable. Do we also need to make some changes to the HttpSM for that to work ?
        Hide
        John Plevyak added a comment -

        lol, my testing tool treats 0 length as varied

        testing now.

        john

        Show
        John Plevyak added a comment - lol, my testing tool treats 0 length as varied testing now. john
        Hide
        John Plevyak added a comment -

        yes, the HTTP state machine needs some more changes, and these are beyond me.

        I changed it so that it make the correct calls to the cache, but it seems that
        content-length of 0 is hard-wired into HttpSM as an error. The problem emerges
        in this=0x7fffea3e01c0, event=103, c=0x7fffea3e1f40) at HttpSM.cc:3162
        3162 c->vc->do_io_close(EHTTP_ERROR);
        (gdb) list
        3157 // we got a truncated header from the origin server
        3158 // but decided to accpet it anyways
        3159 if (c->write_vio == NULL)

        { 3160 *status_ptr = HttpTransact::CACHE_WRITE_ERROR; 3161 c->write_success = false; 3162 c->vc->do_io_close(EHTTP_ERROR); 3163 }

        else {
        3164 *status_ptr = HttpTransact::CACHE_WRITE_COMPLETE;
        3165 c->write_success = true;
        3166 c->write_vio = c->vc->do_io(VIO::CLOSE);

        It seems that c->write_vio is NULL which causes the HttpSM to close the cache
        with an error....

        It is easy to test... just put a breakpoint in CacheVC::openWriteClose

        The close should be without error.

        Show
        John Plevyak added a comment - yes, the HTTP state machine needs some more changes, and these are beyond me. I changed it so that it make the correct calls to the cache, but it seems that content-length of 0 is hard-wired into HttpSM as an error. The problem emerges in this=0x7fffea3e01c0, event=103, c=0x7fffea3e1f40) at HttpSM.cc:3162 3162 c->vc->do_io_close(EHTTP_ERROR); (gdb) list 3157 // we got a truncated header from the origin server 3158 // but decided to accpet it anyways 3159 if (c->write_vio == NULL) { 3160 *status_ptr = HttpTransact::CACHE_WRITE_ERROR; 3161 c->write_success = false; 3162 c->vc->do_io_close(EHTTP_ERROR); 3163 } else { 3164 *status_ptr = HttpTransact::CACHE_WRITE_COMPLETE; 3165 c->write_success = true; 3166 c->write_vio = c->vc->do_io(VIO::CLOSE); It seems that c->write_vio is NULL which causes the HttpSM to close the cache with an error.... It is easy to test... just put a breakpoint in CacheVC::openWriteClose The close should be without error.
        Hide
        Leif Hedstrom added a comment -

        Actually, I suspect it's in HttpTunnel.cc:846:

            if (c_write == 0) {
              // Nothing to do, call back the cleanup handlers
              c->write_vio = NULL;
              consumer_handler(VC_EVENT_WRITE_COMPLETE, c);
            } else {
              c->write_vio = c->vc->do_io_write(this, c_write, c->buffer_reader);
              ink_assert(c_write > 0);
            }
        
            c = c->link.next;
          }
        

        I think this makes some sense actually

        Show
        Leif Hedstrom added a comment - Actually, I suspect it's in HttpTunnel.cc:846: if (c_write == 0) { // Nothing to do , call back the cleanup handlers c->write_vio = NULL; consumer_handler(VC_EVENT_WRITE_COMPLETE, c); } else { c->write_vio = c->vc->do_io_write( this , c_write, c->buffer_reader); ink_assert(c_write > 0); } c = c->link.next; } I think this makes some sense actually
        Hide
        Leif Hedstrom added a comment -

        Grrr, even if I "fix" this is HttpTunnel.cc, it doesn't work. What's even more strange, I compared the tracers on http.* with this "fix" and a 0-length body, and stock trunk and a normal size body, and the tracers are now identical (without my hack fix, it differed since it'd call the VC_EVENT_WRITE_COMPLETE event way early).

        But something else is still preventing it from caching it :/.

        Show
        Leif Hedstrom added a comment - Grrr, even if I "fix" this is HttpTunnel.cc, it doesn't work. What's even more strange, I compared the tracers on http.* with this "fix" and a 0-length body, and stock trunk and a normal size body, and the tracers are now identical (without my hack fix, it differed since it'd call the VC_EVENT_WRITE_COMPLETE event way early). But something else is still preventing it from caching it :/.
        Hide
        John Plevyak added a comment -

        Updated patch.

        Show
        John Plevyak added a comment - Updated patch.
        Hide
        John Plevyak added a comment -

        This one works... but I would consider it very risky.

        Show
        John Plevyak added a comment - This one works... but I would consider it very risky.
        Hide
        Leif Hedstrom added a comment -

        Yep, -3 works. This seems ok so far to me, and in line with where I was looking as well. I'll give it some more tests, but id' be great if ibrezac could test it too.

        Show
        Leif Hedstrom added a comment - Yep, -3 works. This seems ok so far to me, and in line with where I was looking as well. I'll give it some more tests, but id' be great if ibrezac could test it too.
        Hide
        Leif Hedstrom added a comment -

        I think I've found a problem on clustering here. I'm fetching e.g.

        curl -D - -o /dev/null -s -x cluster1.ogre.com:8080 http://www.howstuffworks.com/bearing/

        So, this 301 gets cached on cluster1, I'm assuming the hashing on the URL does that. But if I fetch the same object from cluster2.ogre.com:8080, it's always a cache miss. And to make things worse, it actually forces cache1 to refresh the object as well (so it gets refetched, and updated in cache).

        Also, it seems that 40x's are always cached in cluster mode now, even with the proxy.config.http.negative_caching_enabled setting disabled.

        Neither of these problems seems to occur in the standalone cache scenario. I'll try to do some more debugging on this as well, and yes, I know we already suspect issues with clustering. John, you have any immediate ideas we should look into?

        Show
        Leif Hedstrom added a comment - I think I've found a problem on clustering here. I'm fetching e.g. curl -D - -o /dev/null -s -x cluster1.ogre.com:8080 http://www.howstuffworks.com/bearing/ So, this 301 gets cached on cluster1, I'm assuming the hashing on the URL does that. But if I fetch the same object from cluster2.ogre.com:8080, it's always a cache miss. And to make things worse, it actually forces cache1 to refresh the object as well (so it gets refetched, and updated in cache). Also, it seems that 40x's are always cached in cluster mode now, even with the proxy.config.http.negative_caching_enabled setting disabled. Neither of these problems seems to occur in the standalone cache scenario. I'll try to do some more debugging on this as well, and yes, I know we already suspect issues with clustering. John, you have any immediate ideas we should look into?
        Hide
        Zhao Yongming added a comment -

        I don't know why, from my point:
        1, every miss will trigger up a write on the cluster, which will done with success indeed.
        2, but when do cache_read on the cluster, the remote side will just send back a fail response.
        here is my debug info from traffic.out:

        [May 18 17:38:53.348] Server {47562039445248} DEBUG: (cache_msg) cache_op-s op=1 seqno=1067 data=0x1189018 len=77 machine=0x11d65c0
        [May 18 17:38:53.348] Server {47562039445248} DEBUG: (cluster_trace) alloc_channel remote chan=1097 VC=0x12339d0
        [May 18 17:38:53.348] Server {47562039445248} DEBUG: (cluster_trace) VC start alloc remote chan=1097 VC=0x12339d0
        [May 18 17:38:53.348] Server {47562039445248} DEBUG: (cache_proto) 0read op, seqno=1067 chan=1097 bufsize=2097152 token=-308068854,548
        [May 18 17:38:53.348] Server {47562039445248} DEBUG: (cache_hosting) Generic volume: f572d8d for host: 
        [May 18 17:38:53.348] Server {47562039445248} DEBUG: (cache_new) new 0x12f96b0
        [May 18 17:38:53.348] Server {47562039445248} DEBUG: (http_match) [SelectFromAlternates] # alternates = 5
        [May 18 17:38:53.348] Server {47562039445248} DEBUG: (http_seq) [SelectFromAlternates] 5 alternates for this cached doc
        [alts] There are 5 alternates for this request header.
        [May 18 17:38:53.348] Server {47562039445248} DEBUG: (cache_proto) setupVCdataRead CACHE_EVENT_OPEN_READ seqno=1067
        [May 18 17:38:53.348] Server {47562039445248} DEBUG: (cache_close) do_io_close 0x12f96b0 -1 1
        [May 18 17:38:53.348] Server {47562039445248} DEBUG: (cache_proto) replyOpEvent(this=11fd4f0,event=1103,VC=ffffb115)
        [May 18 17:38:53.348] Server {47562039445248} DEBUG: (cache_proto) cache operation failed result=1103 seqno=1067 (this=11fd4f0)
        [May 18 17:38:53.348] Server {47562039445248} DEBUG: (cache_proto) Sending reply/data seqno=1067 buflen=0
        [May 18 17:38:53.348] Server {47562039445248} DEBUG: (cache_proto) replyOpEvent: freeing this=11fd4f0
        [May 18 17:38:53.348] Server {47562039445248} DEBUG: (cache_free) free 0x12f96b0
        [May 18 17:38:53.349] Server {47562039445248} DEBUG: (cluster_monitor_ping) [10.62.163.237] ping: 0 34937
        [May 18 17:38:53.349] Server {47562039445248} DEBUG: (cluster_trace) free_channel remote chan=1097 VC=0x12339d0
        

        while the common objects will just response:

        [May 18 17:18:36.358] Server {47562039445248} DEBUG: (cache_msg) cache_op-s op=1 seqno=821 data=0x1189018 len=75 machine=0x11d65c0
        [May 18 17:18:36.358] Server {47562039445248} DEBUG: (cache_proto) 0read op, seqno=821 chan=847 bufsize=2097152 token=-308068854,423
        [May 18 17:18:36.358] Server {47562039445248} DEBUG: (cache_hosting) Generic volume: f572d8d for host: 
        [May 18 17:18:36.358] Server {47562039445248} DEBUG: (cache_new) new 0x12bc9e0
        [May 18 17:18:36.358] Server {47562039445248} DEBUG: (http_match) [SelectFromAlternates] # alternates = 1
        [May 18 17:18:36.358] Server {47562039445248} DEBUG: (http_seq) [SelectFromAlternates] 1 alternates for this cached doc
        [alts] There are 1 alternates for this request header.
        [May 18 17:18:36.358] Server {47562039445248} DEBUG: (cache_proto) setupVCdataRead CACHE_EVENT_OPEN_READ seqno=821
        [May 18 17:18:36.358] Server {47562039445248} DEBUG: (cache_close) do_io_close 0x12bc9e0 -1 1
        [May 18 17:18:36.358] Server {47562039445248} DEBUG: (cache_proto) replyOpEvent(this=11fd4f0,event=1102,VC=12bc9e0)
        [May 18 17:18:36.358] Server {47562039445248} DEBUG: (cache_proto) connect_local success seqno=821 alldata=1
        [May 18 17:18:36.358] Server {47562039445248} DEBUG: (cache_proto) Sending reply/data seqno=821 buflen=77089
        [May 18 17:18:36.358] Server {47562039445248} DEBUG: (cache_free) free 0x12bc9e0
        
        Show
        Zhao Yongming added a comment - I don't know why, from my point: 1, every miss will trigger up a write on the cluster, which will done with success indeed. 2, but when do cache_read on the cluster, the remote side will just send back a fail response. here is my debug info from traffic.out: [May 18 17:38:53.348] Server {47562039445248} DEBUG: (cache_msg) cache_op-s op=1 seqno=1067 data=0x1189018 len=77 machine=0x11d65c0 [May 18 17:38:53.348] Server {47562039445248} DEBUG: (cluster_trace) alloc_channel remote chan=1097 VC=0x12339d0 [May 18 17:38:53.348] Server {47562039445248} DEBUG: (cluster_trace) VC start alloc remote chan=1097 VC=0x12339d0 [May 18 17:38:53.348] Server {47562039445248} DEBUG: (cache_proto) 0read op, seqno=1067 chan=1097 bufsize=2097152 token=-308068854,548 [May 18 17:38:53.348] Server {47562039445248} DEBUG: (cache_hosting) Generic volume: f572d8d for host: [May 18 17:38:53.348] Server {47562039445248} DEBUG: (cache_new) new 0x12f96b0 [May 18 17:38:53.348] Server {47562039445248} DEBUG: (http_match) [SelectFromAlternates] # alternates = 5 [May 18 17:38:53.348] Server {47562039445248} DEBUG: (http_seq) [SelectFromAlternates] 5 alternates for this cached doc [alts] There are 5 alternates for this request header. [May 18 17:38:53.348] Server {47562039445248} DEBUG: (cache_proto) setupVCdataRead CACHE_EVENT_OPEN_READ seqno=1067 [May 18 17:38:53.348] Server {47562039445248} DEBUG: (cache_close) do_io_close 0x12f96b0 -1 1 [May 18 17:38:53.348] Server {47562039445248} DEBUG: (cache_proto) replyOpEvent( this =11fd4f0,event=1103,VC=ffffb115) [May 18 17:38:53.348] Server {47562039445248} DEBUG: (cache_proto) cache operation failed result=1103 seqno=1067 ( this =11fd4f0) [May 18 17:38:53.348] Server {47562039445248} DEBUG: (cache_proto) Sending reply/data seqno=1067 buflen=0 [May 18 17:38:53.348] Server {47562039445248} DEBUG: (cache_proto) replyOpEvent: freeing this =11fd4f0 [May 18 17:38:53.348] Server {47562039445248} DEBUG: (cache_free) free 0x12f96b0 [May 18 17:38:53.349] Server {47562039445248} DEBUG: (cluster_monitor_ping) [10.62.163.237] ping: 0 34937 [May 18 17:38:53.349] Server {47562039445248} DEBUG: (cluster_trace) free_channel remote chan=1097 VC=0x12339d0 while the common objects will just response: [May 18 17:18:36.358] Server {47562039445248} DEBUG: (cache_msg) cache_op-s op=1 seqno=821 data=0x1189018 len=75 machine=0x11d65c0 [May 18 17:18:36.358] Server {47562039445248} DEBUG: (cache_proto) 0read op, seqno=821 chan=847 bufsize=2097152 token=-308068854,423 [May 18 17:18:36.358] Server {47562039445248} DEBUG: (cache_hosting) Generic volume: f572d8d for host: [May 18 17:18:36.358] Server {47562039445248} DEBUG: (cache_new) new 0x12bc9e0 [May 18 17:18:36.358] Server {47562039445248} DEBUG: (http_match) [SelectFromAlternates] # alternates = 1 [May 18 17:18:36.358] Server {47562039445248} DEBUG: (http_seq) [SelectFromAlternates] 1 alternates for this cached doc [alts] There are 1 alternates for this request header. [May 18 17:18:36.358] Server {47562039445248} DEBUG: (cache_proto) setupVCdataRead CACHE_EVENT_OPEN_READ seqno=821 [May 18 17:18:36.358] Server {47562039445248} DEBUG: (cache_close) do_io_close 0x12bc9e0 -1 1 [May 18 17:18:36.358] Server {47562039445248} DEBUG: (cache_proto) replyOpEvent( this =11fd4f0,event=1102,VC=12bc9e0) [May 18 17:18:36.358] Server {47562039445248} DEBUG: (cache_proto) connect_local success seqno=821 alldata=1 [May 18 17:18:36.358] Server {47562039445248} DEBUG: (cache_proto) Sending reply/data seqno=821 buflen=77089 [May 18 17:18:36.358] Server {47562039445248} DEBUG: (cache_free) free 0x12bc9e0
        Hide
        Leif Hedstrom added a comment -

        Quick update: The caching of 40x's happens without this patch. I will file a bug on that separately.

        But there's still the issue of the request hitting a cluster host which needs to go to a peer, that will still bust the caches.

        Show
        Leif Hedstrom added a comment - Quick update: The caching of 40x's happens without this patch. I will file a bug on that separately. But there's still the issue of the request hitting a cluster host which needs to go to a peer, that will still bust the caches.
        Hide
        Leif Hedstrom added a comment -

        The 40x caching was a red herring, but the other problem remains. Unless ming_zym or mohan have time to look at it, I'll take a look.

        Show
        Leif Hedstrom added a comment - The 40x caching was a red herring, but the other problem remains. Unless ming_zym or mohan have time to look at it, I'll take a look.
        Hide
        Leif Hedstrom added a comment -

        Did a little more debugging, what seems to happen is that when cluster2 has to proxy to cluster1, it actually ends up creating a new alternate on caching server (cluster1). This is why it looks like a cache miss again on cluster1 I think, because it's somehow invalidating the current alternate, forcing it to fetch a new one.

        So, this is a pretty serious problem with clustering enabled, because you would churn through the cache very frequently.

        Show
        Leif Hedstrom added a comment - Did a little more debugging, what seems to happen is that when cluster2 has to proxy to cluster1, it actually ends up creating a new alternate on caching server (cluster1). This is why it looks like a cache miss again on cluster1 I think, because it's somehow invalidating the current alternate, forcing it to fetch a new one. So, this is a pretty serious problem with clustering enabled, because you would churn through the cache very frequently.
        Hide
        Leif Hedstrom added a comment -

        From what I can tell, it seems that even though it finds the appropriate alternate, it is unable to open the cache for reading (I think). E.g. I see this:

        [May 18 19:54:12.896] Server {1172630384} DEBUG: (http_seq) [SelectFromAlternates] Chosen alternate # 0
        [alts] and the winner is alternate number 1
        [May 18 19:54:12.897] Server {1172630384} DEBUG: (cache_proto) setupVCdataRead CACHE_EVENT_OPEN_READ seqno=43
        [May 18 19:54:12.897] Server {1172630384} DEBUG: (cache_proto) replyOpEvent(this=4191a900,event=1103,VC=ffffb115)
        [May 18 19:54:12.897] Server {1172630384} DEBUG: (cache_proto) cache operation failed result=1103 seqno=43 (this=4191a900)
        [May 18 19:54:12.897] Server {1172630384} DEBUG: (cache_proto) Sending reply/data seqno=43 buflen=0
        [May 18 19:54:12.897] Server {1172630384} DEBUG: (cache_proto) replyOpEvent: freeing this=4191a900
        [May 18 19:54:12.897] Server {1172630384} DEBUG: (cache_free) free 0xa399d40
        [May 18 19:54:12.960] Server {1172630384} DEBUG: (cache_msg) cache_op-l op=7 seqno=44 data=0xa2e3018 len=77 machine=0x41900468
        [May 18 19:54:12.960] Server {1172630384} DEBUG: (cache_new) new 0xa399d40
        [May 18 19:54:12.960] Server {1172630384} DEBUG: (cache_hosting) Generic volume: a1fc93cd for host: (null)
        [May 18 19:54:12.960] Server {1172630384} DEBUG: (cache_proto) replyOpEvent(this=4191a900,event=1108,VC=a399d40)
        [May 18 19:54:12.960] Server {1172630384} DEBUG: (cache_proto) cache_open [W] success seqno=44
        [May 18 19:54:12.960] Server {1172630384} DEBUG: (cache_proto) Sending reply seqno=44, (this=4191a900)
        [May 18 19:54:12.961] Server {1172630384} DEBUG: (cache_proto) replyOpEvent: freeing this=4191a900
        [May 18 19:54:13.556] Server {1172630384} DEBUG: (cache_free) free 0xa399d40
        
        Show
        Leif Hedstrom added a comment - From what I can tell, it seems that even though it finds the appropriate alternate, it is unable to open the cache for reading (I think ). E.g. I see this: [May 18 19:54:12.896] Server {1172630384} DEBUG: (http_seq) [SelectFromAlternates] Chosen alternate # 0 [alts] and the winner is alternate number 1 [May 18 19:54:12.897] Server {1172630384} DEBUG: (cache_proto) setupVCdataRead CACHE_EVENT_OPEN_READ seqno=43 [May 18 19:54:12.897] Server {1172630384} DEBUG: (cache_proto) replyOpEvent( this =4191a900,event=1103,VC=ffffb115) [May 18 19:54:12.897] Server {1172630384} DEBUG: (cache_proto) cache operation failed result=1103 seqno=43 ( this =4191a900) [May 18 19:54:12.897] Server {1172630384} DEBUG: (cache_proto) Sending reply/data seqno=43 buflen=0 [May 18 19:54:12.897] Server {1172630384} DEBUG: (cache_proto) replyOpEvent: freeing this =4191a900 [May 18 19:54:12.897] Server {1172630384} DEBUG: (cache_free) free 0xa399d40 [May 18 19:54:12.960] Server {1172630384} DEBUG: (cache_msg) cache_op-l op=7 seqno=44 data=0xa2e3018 len=77 machine=0x41900468 [May 18 19:54:12.960] Server {1172630384} DEBUG: (cache_new) new 0xa399d40 [May 18 19:54:12.960] Server {1172630384} DEBUG: (cache_hosting) Generic volume: a1fc93cd for host: ( null ) [May 18 19:54:12.960] Server {1172630384} DEBUG: (cache_proto) replyOpEvent( this =4191a900,event=1108,VC=a399d40) [May 18 19:54:12.960] Server {1172630384} DEBUG: (cache_proto) cache_open [W] success seqno=44 [May 18 19:54:12.960] Server {1172630384} DEBUG: (cache_proto) Sending reply seqno=44, ( this =4191a900) [May 18 19:54:12.961] Server {1172630384} DEBUG: (cache_proto) replyOpEvent: freeing this =4191a900 [May 18 19:54:13.556] Server {1172630384} DEBUG: (cache_free) free 0xa399d40
        Hide
        Leif Hedstrom added a comment -

        So, I haven't had a chance to properly play with this, but I suspect this code in ClusterCache.cc:CacheContinuation::VCdataRead() might be part of the problem:

          switch (event) {
          case VC_EVENT_EOS:
            {
              if (!target_vio->ndone) {
                // Doc with zero byte body, handle as read failure
                goto read_failed;
              }
              // Fall through
            }
        

        But, now I'm off to watch hockey, go sharks!

        Show
        Leif Hedstrom added a comment - So, I haven't had a chance to properly play with this, but I suspect this code in ClusterCache.cc:CacheContinuation::VCdataRead() might be part of the problem: switch (event) { case VC_EVENT_EOS: { if (!target_vio->ndone) { // Doc with zero byte body, handle as read failure goto read_failed; } // Fall through } But, now I'm off to watch hockey, go sharks!
        Hide
        Zhao Yongming added a comment -

        I try to get the zero size object as a normal return. here is the patch make cluster happy and it passed my testing. It may need more tweak.

        Just FYI first

        Show
        Zhao Yongming added a comment - I try to get the zero size object as a normal return. here is the patch make cluster happy and it passed my testing. It may need more tweak. Just FYI first
        Hide
        John Plevyak added a comment -

        Looks good, but I agree, this set of changes is pretty invasive and I don't
        have great
        confidence that we haven't missed some corner cases.

        I am somewhat reluctant to dump it in just before 3.0, but in any case I
        think we need to
        review the code a bit more before committing....

        john

        Show
        John Plevyak added a comment - Looks good, but I agree, this set of changes is pretty invasive and I don't have great confidence that we haven't missed some corner cases. I am somewhat reluctant to dump it in just before 3.0, but in any case I think we need to review the code a bit more before committing.... john
        Hide
        Leif Hedstrom added a comment -

        The changes seem invasive to me too, but are definitely very useful. A few options I guess

        1) Just go with this, and hope for the best (but we'll do more testing of course before v3.0). The sooner we land it, the better then.

        2) We make a new configuration to allow zero-length bodies, off by default, and try to make the code as "backward compatible" as possible with this option turned off. This makes it possible to restore old behavior if there turns out to be a bug somewhere, but I can see how adding an option like this would be pretty gnarly.

        3) We postpone landing this until v3.0 is release.

        I'll do some more testing today as well.

        Show
        Leif Hedstrom added a comment - The changes seem invasive to me too, but are definitely very useful. A few options I guess 1) Just go with this, and hope for the best (but we'll do more testing of course before v3.0). The sooner we land it, the better then. 2) We make a new configuration to allow zero-length bodies, off by default, and try to make the code as "backward compatible" as possible with this option turned off. This makes it possible to restore old behavior if there turns out to be a bug somewhere, but I can see how adding an option like this would be pretty gnarly. 3) We postpone landing this until v3.0 is release. I'll do some more testing today as well.
        Hide
        John Plevyak added a comment -

        Obviously the patch needs to be fixed up a bit. The Cluster used the
        CacheDataType
        as a message type, so I hacked in:

        enum CacheDataType

        { CACHE_DATA_SIZE = VCONNECTION_CACHE_DATA_BASE, - CACHE_DATA_HTTP_INFO, + CACHE_DATA_HTTP_INFO_LEAVE_BODY, + CACHE_DATA_HTTP_INFO_REPLACE_BODY, CACHE_DATA_KEY, CACHE_DATA_RAM_CACHE_HIT_FLAG }

        ;

        Which doesn't really make sense. The leave/replace bit should be encoded
        somewhere else in the message.

        The changes to CacheWrite are very tricky and I have little faith in them.

        We could land it, but we would needs some serious testing...

        Show
        John Plevyak added a comment - Obviously the patch needs to be fixed up a bit. The Cluster used the CacheDataType as a message type, so I hacked in: enum CacheDataType { CACHE_DATA_SIZE = VCONNECTION_CACHE_DATA_BASE, - CACHE_DATA_HTTP_INFO, + CACHE_DATA_HTTP_INFO_LEAVE_BODY, + CACHE_DATA_HTTP_INFO_REPLACE_BODY, CACHE_DATA_KEY, CACHE_DATA_RAM_CACHE_HIT_FLAG } ; Which doesn't really make sense. The leave/replace bit should be encoded somewhere else in the message. The changes to CacheWrite are very tricky and I have little faith in them. We could land it, but we would needs some serious testing...
        Hide
        John Plevyak added a comment -

        This change is just too risky to land in 3.0. We will make the change first thing in 3.1 and then backport if/when it proves stable.

        Show
        John Plevyak added a comment - This change is just too risky to land in 3.0. We will make the change first thing in 3.1 and then backport if/when it proves stable.
        Hide
        Leif Hedstrom added a comment -

        Marking these as targets for backport to 3.0.1, so we remember that they are "important".

        Show
        Leif Hedstrom added a comment - Marking these as targets for backport to 3.0.1, so we remember that they are "important".
        Hide
        Zhao Yongming added a comment -

        ab testing but failed to stand all the time, it will failed after the cluster channel conflicts.

        things like when do cluster reading, the channel is not closed.

        Show
        Zhao Yongming added a comment - ab testing but failed to stand all the time, it will failed after the cluster channel conflicts. things like when do cluster reading, the channel is not closed.
        Hide
        Leif Hedstrom added a comment -

        Moving out some bugs to 3.1.1, please rearrange if you plan on fixing any of these in the next few weeks.

        Show
        Leif Hedstrom added a comment - Moving out some bugs to 3.1.1, please rearrange if you plan on fixing any of these in the next few weeks.
        Hide
        Leif Hedstrom added a comment -

        Moving these to 3.1.2 for now. please move back if they will be worked on asap for 3.1.1.

        Show
        Leif Hedstrom added a comment - Moving these to 3.1.2 for now. please move back if they will be worked on asap for 3.1.1.
        Hide
        Leif Hedstrom added a comment -

        Moved to 3.1.4, please move bugs back to 3.1.3, which you will work on in the next 2 weeks.

        Show
        Leif Hedstrom added a comment - Moved to 3.1.4, please move bugs back to 3.1.3, which you will work on in the next 2 weeks.
        Hide
        weijin added a comment -

        Base on John`s solution, I also write a patch to cache empty documents. It`s simple and have no need to modify the tunnel(http_sm). The patch have side effects in cluster mode (3), if set force_cache_emtpy_doc, It can write empty doc into another machine, but can not read it from that machine.

        Show
        weijin added a comment - Base on John`s solution, I also write a patch to cache empty documents. It`s simple and have no need to modify the tunnel(http_sm). The patch have side effects in cluster mode (3), if set force_cache_emtpy_doc, It can write empty doc into another machine, but can not read it from that machine.
        Hide
        weijin added a comment -

        Leif, John, Pleas review the patch and give me your advices.

        Show
        weijin added a comment - Leif, John, Pleas review the patch and give me your advices.
        Hide
        John Plevyak added a comment -

        weijin, your patch relies on the Content-Length: header. That is probably safe and covers most situations, but I don't know that it will work in all situations. Old school servers (1.0) which don't use Content-Length: will not benefit. I have to admit that it does finesse the compatibility issues, but I was hoping for a "complete" solution. Perhaps we can look at using some combination of the patches whereby we use the API changes I am proposing and verify that we are doing the right thing with the Content-Length and perhaps use your flag?

        I'll look into it as well.

        Show
        John Plevyak added a comment - weijin, your patch relies on the Content-Length: header. That is probably safe and covers most situations, but I don't know that it will work in all situations. Old school servers (1.0) which don't use Content-Length: will not benefit. I have to admit that it does finesse the compatibility issues, but I was hoping for a "complete" solution. Perhaps we can look at using some combination of the patches whereby we use the API changes I am proposing and verify that we are doing the right thing with the Content-Length and perhaps use your flag? I'll look into it as well.
        Hide
        weijin added a comment -

        John: the patch was just a temporary solution and I did not take into
        account this situation you mentioned (even did not know). So if you have
        any ideas about it, tell me.

        On Thu, 2012-04-19 at 03:51 +0000, John Plevyak (Commented) (JIRA)

        Show
        weijin added a comment - John: the patch was just a temporary solution and I did not take into account this situation you mentioned (even did not know). So if you have any ideas about it, tell me. On Thu, 2012-04-19 at 03:51 +0000, John Plevyak (Commented) (JIRA)
        Hide
        Leif Hedstrom added a comment -

        Going to move this out to 3.3.0.

        Show
        Leif Hedstrom added a comment - Going to move this out to 3.3.0.
        Hide
        Bin Chen added a comment -

        In http procotol, we verify receiving remote response only by Content-Length or chunk. so if body length is 0 and no Content-Length, we can`t verify receiving all response data. one selection, we suppose receiving all reponse data; the other, we suppose not receiving all with some tcp errors. In most cases, we can cached response which only having length info.

        Show
        Bin Chen added a comment - In http procotol, we verify receiving remote response only by Content-Length or chunk. so if body length is 0 and no Content-Length, we can`t verify receiving all response data. one selection, we suppose receiving all reponse data; the other, we suppose not receiving all with some tcp errors. In most cases, we can cached response which only having length info.
        Hide
        Leif Hedstrom added a comment -

        Heh, this is becoming quite a novel. What shall we do here ? This would be great to get into v3.4.0 release, but is it safe to do so ?

        Show
        Leif Hedstrom added a comment - Heh, this is becoming quite a novel. What shall we do here ? This would be great to get into v3.4.0 release, but is it safe to do so ?
        Hide
        Zhao Yongming added a comment -

        we have this patch in production for years, should we go on merge them or wait for the better solution? I'd prefer to commit it and hopes someone make it perfect later.

        Show
        Zhao Yongming added a comment - we have this patch in production for years, should we go on merge them or wait for the better solution? I'd prefer to commit it and hopes someone make it perfect later.
        Hide
        Leif Hedstrom added a comment -

        Let me take a look at it again, from the protocol perspective. If John has some better ideas than this, like his ideas of combining the patches, that'd be great .

        Show
        Leif Hedstrom added a comment - Let me take a look at it again, from the protocol perspective. If John has some better ideas than this, like his ideas of combining the patches, that'd be great .
        Hide
        Leif Hedstrom added a comment -

        I'm going to make a few cosmetic changes, and commit this on behalf of binchen.

        Show
        Leif Hedstrom added a comment - I'm going to make a few cosmetic changes, and commit this on behalf of binchen.
        Hide
        ASF subversion and git services added a comment -

        Commit e2aff41f8d68144a0e90b1412c6006b784769af4 in branch refs/heads/master from Leif Hedstrom <zwoop@apache.org>
        [ https://git-wip-us.apache.org/repos/asf?p=trafficserver.git;h=e2aff41 ]

        TS-621 Allow caching of empty docs (currently only if a header
        Content-Length: 0 is in the response). New config option is named
        proxy.config.http.cache.allow_empty_doc, and is disabled by default.

        We are aware this doesn't fully fix the problem, but is "good enough"
        for now.

        Reviews and minor cosmetic cleanup changes: James and Leif.

        Show
        ASF subversion and git services added a comment - Commit e2aff41f8d68144a0e90b1412c6006b784769af4 in branch refs/heads/master from Leif Hedstrom <zwoop@apache.org> [ https://git-wip-us.apache.org/repos/asf?p=trafficserver.git;h=e2aff41 ] TS-621 Allow caching of empty docs (currently only if a header Content-Length: 0 is in the response). New config option is named proxy.config.http.cache.allow_empty_doc, and is disabled by default. We are aware this doesn't fully fix the problem, but is "good enough" for now. Reviews and minor cosmetic cleanup changes: James and Leif.
        Hide
        ASF subversion and git services added a comment -

        Commit b5b1323835428335696db98db1378c41d11f76c1 in branch refs/heads/3.3.x from Leif Hedstrom <zwoop@apache.org>
        [ https://git-wip-us.apache.org/repos/asf?p=trafficserver.git;h=b5b1323 ]

        TS-621 Allow caching of empty docs (currently only if a header
        Content-Length: 0 is in the response). New config option is named
        proxy.config.http.cache.allow_empty_doc, and is disabled by default.

        We are aware this doesn't fully fix the problem, but is "good enough"
        for now.

        Reviews and minor cosmetic cleanup changes: James and Leif.

        Show
        ASF subversion and git services added a comment - Commit b5b1323835428335696db98db1378c41d11f76c1 in branch refs/heads/3.3.x from Leif Hedstrom <zwoop@apache.org> [ https://git-wip-us.apache.org/repos/asf?p=trafficserver.git;h=b5b1323 ] TS-621 Allow caching of empty docs (currently only if a header Content-Length: 0 is in the response). New config option is named proxy.config.http.cache.allow_empty_doc, and is disabled by default. We are aware this doesn't fully fix the problem, but is "good enough" for now. Reviews and minor cosmetic cleanup changes: James and Leif.
        Hide
        ASF subversion and git services added a comment -

        Commit d64480be9197ae83d1f2701033ca8b4f9cf59d9b in branch refs/heads/3.3.x from Leif Hedstrom
        [ https://git-wip-us.apache.org/repos/asf?p=trafficserver.git;h=d64480b ]

        Merge branch 'master' into 3.3.x

        • master: (64 commits)
          Updated STATUS with v3.3.2 release.
          Add some new generated output files from the RAT checks.
          TS-621 Allow caching of empty docs (currently only if a header Content-Length: 0 is in the response). New config option is named proxy.config.http.cache.allow_empty_doc, and is disabled by default.
          TS-1778: Remove vestigal extensions.config support
          filename_to_string doesn't need to strdup
          Fix ubuntu vagrant image URL
          TS-1806: bogus buffer sizing in CfgContextUtils.cc
          Remove obsolete mgr.cnf config file
          TS-1805: fix uninitialized result_type in NodeStatEval()
          TS-1805: fix conversion when pass value to statVarSet()
          Drop in libloader code.
          Fix vagrant VM addressing
          Turn the vagrant file back into actual ruby
          streamline vagrant network assignments
          TS-1760: clean up and fix --enable-linux-native-aio
          Clean up --enable-luajit help string
          Clean up --enable-reclaimable-freelist help string
          TS-1067 Fix the code around the proxy.config.udp.theads option
          TS-1760: Option to use Linux native AIO
          Teach vagrant that omnios is solaris
          ...
        Show
        ASF subversion and git services added a comment - Commit d64480be9197ae83d1f2701033ca8b4f9cf59d9b in branch refs/heads/3.3.x from Leif Hedstrom [ https://git-wip-us.apache.org/repos/asf?p=trafficserver.git;h=d64480b ] Merge branch 'master' into 3.3.x master: (64 commits) Updated STATUS with v3.3.2 release. Add some new generated output files from the RAT checks. TS-621 Allow caching of empty docs (currently only if a header Content-Length: 0 is in the response). New config option is named proxy.config.http.cache.allow_empty_doc, and is disabled by default. TS-1778 : Remove vestigal extensions.config support filename_to_string doesn't need to strdup Fix ubuntu vagrant image URL TS-1806 : bogus buffer sizing in CfgContextUtils.cc Remove obsolete mgr.cnf config file TS-1805 : fix uninitialized result_type in NodeStatEval() TS-1805 : fix conversion when pass value to statVarSet() Drop in libloader code. Fix vagrant VM addressing Turn the vagrant file back into actual ruby streamline vagrant network assignments TS-1760 : clean up and fix --enable-linux-native-aio Clean up --enable-luajit help string Clean up --enable-reclaimable-freelist help string TS-1067 Fix the code around the proxy.config.udp.theads option TS-1760 : Option to use Linux native AIO Teach vagrant that omnios is solaris ...
        Hide
        ASF subversion and git services added a comment -

        Commit e2aff41f8d68144a0e90b1412c6006b784769af4 in branch refs/heads/3.3.x from Leif Hedstrom <zwoop@apache.org>
        [ https://git-wip-us.apache.org/repos/asf?p=trafficserver.git;h=e2aff41 ]

        TS-621 Allow caching of empty docs (currently only if a header
        Content-Length: 0 is in the response). New config option is named
        proxy.config.http.cache.allow_empty_doc, and is disabled by default.

        We are aware this doesn't fully fix the problem, but is "good enough"
        for now.

        Reviews and minor cosmetic cleanup changes: James and Leif.

        Show
        ASF subversion and git services added a comment - Commit e2aff41f8d68144a0e90b1412c6006b784769af4 in branch refs/heads/3.3.x from Leif Hedstrom <zwoop@apache.org> [ https://git-wip-us.apache.org/repos/asf?p=trafficserver.git;h=e2aff41 ] TS-621 Allow caching of empty docs (currently only if a header Content-Length: 0 is in the response). New config option is named proxy.config.http.cache.allow_empty_doc, and is disabled by default. We are aware this doesn't fully fix the problem, but is "good enough" for now. Reviews and minor cosmetic cleanup changes: James and Leif.

          People

          • Assignee:
            Leif Hedstrom
            Reporter:
            John Plevyak
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development