Description
Following a discussion on the IRC, opening this jira to deprecate the Allow empty doc cache setting. The consensus is to cache anything that's cacheable and valid (regardless, of if it's empty or non-empty) - Just to clarify, it is not clear there's a way to ensure that it's safe to cache anything right now. TS-621 tried to cache empty docs in the past but ran into many difficulties, so, opening TS-3980 to investigate how Http layer (or other higher layers) may safely indicate the CacheVC that it's safe to cache.
Amaryllis 7:50 sudheerv: do any other reverse proxies refuse to cache empty documents without content-length? it's not something i've noticed before (in varnish, for example) Amaryllis 7:50 the connection could still be broken at any point in the response and result in broken content being cached 7:51 but as a compromise, what if there was another option to only cache empty responses for 3xx, which is probably 80% of use cases? 7:52 (maybe, in addition, it should accept transfer-encoding: chunked, since that can indicate end-of-body) zwoop_ 7:54 Amaryllis we should talk to amc (he knows everything, we don't need Google here) about this 7:54 zwoop_ is now known as woop 7:54 woop is now known as zwoop 7:54 zwoop left the room (quit: Changing host). 7:54 zwoop [~zwoop@apache/committer/zwoop] entered the room. 7:54 mode (+o zwoop) by ChanServ zwoop 7:55 Amaryllis amc Maybe we could have a plugin API, that basically forces to set f.allow_empty_doc ? amc 7:55 I think you can change the CacheVC structure without problem. Amaryllis 7:55 i think some sort of fix for this should be in core, at the moment TS won't cache any of our redirects amc 7:56 There's not a global config to allow that? zwoop 7:56 Amaryllis does it send a Cache-Control header with the redirects ? Amaryllis 7:56 amc, zwoop: the origin sends transfer-encoding: chunked instead of content-length. 7:56 so even with cache-control it won't be cached, that's what this PR is to change 7:56 TS-3978 ASFBot 7:56 TS-3978: Allow empty document caching to follow normal logic - https://issues.apache.org/jira/browse/TS-3978 zwoop 7:56 ah 7:57 yeah, that's expected 7:57 and is why we added that option to allow empty docs to be cached Amaryllis 7:57 this is with that option enabled zwoop 7:57 huh Amaryllis 7:57 it still won't cache any document without content-length zwoop 7:57 yeah, need CL: too 7:57 that's a requirement Amaryllis 7:57 so i'll update the PR to also recognise chunked responses as being cacheable even if empty zwoop 7:57 otherwise, it can't distinguish the cases you were worried about (a broken connection) vs a truly empty doc Amaryllis 7:58 zwoop: right, but the PR makes it optional, only if allow_empty_doc=2 zwoop 7:58 heh, your redirect is Chunked ?? Amaryllis 7:58 yes, an empty chunked response zwoop 7:58 is that even correct? Amaryllis 7:58 it's perfectly valid, if somewhat unusual zwoop 7:58 doesn't the chunking require the final "0" ? 7:58 which means, the doc isn't empty Amaryllis 7:58 i think ATS is looking at 'empty' after de-chucking, isn't it? 7:59 because it's definitely not cacheing these responses zwoop 7:59 not sure 7:59 gancho_ left the room (quit: Ping timeout: 272 seconds). amc 7:59 I think Amaryllis is correct. zwoop 7:59 Amaryllis but, I definitely remember that the CL: header was a requirement that can not be ignored (safely), so not sure I think having an allow_empty_doc=2 is ok 8:00 unless allow_empty_doc=2 also means allow Chunked content to be empty amc 8:00 I need to check to see what ATS actually caches - the chunked data or the payload. zwoop 8:00 but, you have to have some indicator telling us that the doc really is empty before we can cache it 8:00 amc is caches the unchunked data 8:00 it dechunks it, and caches that 8:01 is dechunk even a proper verb? amc 8:01 Ah, but it caches the encoding header. Amaryllis 8:01 zwoop: https://dpaste.de/6kUT zwoop 8:01 amc Hmmm, really ? That doesn't sound right amc 8:01 "zwoop drank too much at the summit and dechunked himself in the hallway". Yep, a proper verb. Amaryllis 8:01 that's the actual origin response causing problems for us zwoop 8:01 amc I'd expect it to change it from chunked to a Content-Length: amc 8:02 I was wondering about that. zwoop 8:02 amc lets dechunk big time at the Bar camp amc 8:02 What happens when the content is served? Is the encoding preserved and obeyed? zwoop 8:03 for the first client (the cache miss) I think it seems the chunked response Amaryllis 8:03 yes, TS returns a correct thunked response 8:03 and never caches it, so it's always the same zwoop 8:03 but subsequent requests (when served out of cache) should have a CL: amc 8:03 I was thinking of what happens for chunked responses from origin that are non-zero lenght. zwoop 8:03 Amaryllis can you not convince your origin that they are crazy, and change it to not send a Chunked empty response, and instead send CL: 0 ? sudheerv 8:04 catching up on the messages…but, my concern is the same as zwoop's Amaryllis 8:04 zwoop: no, it never adds a CL: https://dpaste.de/Goex zwoop 8:04 Amaryllis because youa re not caching it Amaryllis 8:04 yes, exactly, because it's not cacheable zwoop 8:04 Amaryllis amc is asking the general question, to which my answer is Amaryllis 8:04 ah zwoop 8:04 amc yes, I'd expect it to send the chunked response to the first client, and subsequent responses (out of cache) with a CL: 0 Amaryllis 8:04 perhaps i can make uwsgi do some sort of output buffering and put a CL header in zwoop 8:05 amc that's at least how it's worked for the last ~10 years, we better not ahve changed it Amaryllis 8:05 zwoop: how can that ever happen? a chunked empty response will never be cached zwoop 8:05 not talking empty responses 8:05 general case amc 8:05 I have many opinions on this. I do think it should be cacheable as an empty doc if it's chunked with a zero length payload. Amaryllis 8:05 okay, but it's not CL: 0 then zwoop 8:05 your case is deranged 8:05 Amaryllis right, nothing to do with your case Amaryllis 8:05 amc: shall i submit a separate PR to fix that? amc 8:06 Yes. zwoop 8:06 amc yeah, that seems reasonable. But, If it was me, I'd still beat the Origin people over the head, cause they are crazy sudheerv 8:06 amc: yes, but, you have to ensure that you received the final chunk 8:06 if it already is the case, by the time you are checking for allow empty doc caching, then that's fine amc 8:06 Right. That is happening in the clips Amaryllis showed us. Amaryllis 8:06 zwoop: i think it's because the origin sends the headers before it's generated the response body. so it doesn't know it's going to be empty sudheerv 8:06 but, i dont know if that's the case already zwoop 8:07 Amaryllis it would know it's a redirect no? 8:07 Amaryllis and why would a redirect have a body, ever ? Amaryllis 8:07 yes, but redirects can still have bodies sudheerv 8:07 you may get empy responses without content-length when we support outbound spdy/h2 as well zwoop 8:07 true Amaryllis 8:07 most web servers include bodies in 3xx 8:07 i did have the origin fix it in one specific case but we need something more general zwoop 8:07 yeah, I forgot about that, you're right 8:07 niq left the room (quit: Ping timeout: 250 seconds). amc 8:08 If it's chunked and ATS recieves the final payload (0 length frame) it can reasonably presume it got valid (non-truncated) content. sudheerv 8:08 Amaryllis: I think the patch should ensure it can cover non-broken empty body cases for chunked-encoding/spdy/h2 Amaryllis 8:08 sudheerv: does this even apply to spdy/h2? Amaryllis 8:09 actually, there is no H2 to origins at all... sudheerv 8:09 not right now - we don't support outbound spdy/h2 yet Amaryllis 8:09 right sudheerv 8:09 but, when we do, it will be a similar problem there Amaryllis 8:09 yes sudheerv 8:09 there's no chunked encoding or content-len with spdy/h2 Amaryllis 8:09 but i can't really test code that's yet to be written sudheerv 8:09 so, yet another case of empty body 8:09 no, my point is - the code we write now must not break when we have spdy/h2 8:10 so, instead of "assuming" that the body is valid, it might make sense 8:10 to somehow indicate that it is Amaryllis 8:10 well, i think it will break anyway, in the sense that empty H2 responses won't be cached... since they have no CL? sudheerv 8:10 well, but you are fixing that Amaryllis 8:10 but H2 doesn't do TE either, does it? i thought it replaces that entire thing sudheerv 8:10 if it's consistently broken (or not supported), that's alright 8:10 yes, it doesn't 8:10 that's my point Amaryllis 8:11 all i'm adding (for now) is to make it accept CL _or_ TE sudheerv 8:11 so, we need to have a generic way of notifying cache layer 8:11 that the response is valid Amaryllis 8:11 and test that empty TE is properly checked for the empty chunk amc 8:12 Hmmm. Maybe a virtual on CacheVConnection "setAllowEmptyContent(bool)". sudheerv 8:12 yeah, something like that amc 8:12 Then an implementation in CacheVC. sudheerv 8:12 amc: i also want to make this setting overridable amc 8:12 Then the higher layer can say "I know this was a valid zero length content, cache monkey" sudheerv 8:12 (but, that's for later) 8:12 yes, that makes sense Amaryllis 8:13 amc: CacheVC already has allow_empty_doc flag, i think that's enough sudheerv 8:13 Amaryllis: yeah, it does - I poked around that a bit recently, didn't seem very straightfwd amc 8:13 Just bang on that directly? sudheerv 8:13 but, pls go ahead and do it if you find it easy Amaryllis 8:13 amc: that was the plan, yes 8:14 amc: https://github.com/apache/trafficserver/pull/310/files#diff-7fada9a23fa1d12e90ca6bec876ecf8fL477 8:14 amc: ... that check just needs another check for chunked as well. sudheerv 8:15 Amaryllis: if we did something like what amc suggested above amc 8:15 No, I meant that it would be set from (for instance) the chunked decoder or just above that. sudheerv 8:15 where the higher layer can notify cache that it can be cached 8:15 then we don't have to check any headers 8:15 that already seems hacky enough amc 8:15 So there wouldn't be an addiitional value for the config, it would just work as epexted. 8:15 expected. sudheerv 8:15 yes jpeach 8:15 there's already a function that checks whether the whole document was received amc 8:16 Hmmm. Interesting. Amaryllis 8:16 i think some things are getting mixed up here - i'm not going to add an additional config value 8:16 that's what the rejected PR did jpeach 8:16 it is used when figuring what to do when getting an EOF amc 8:16 What I would want is a way for the chunk decoder or its parent logic to pass on to the CacheVC the validity of the document. 8:17 Amaryllis- I thought your PR added the value '2' to the config variable. jpeach 8:17 afaict as long as you can know you have the entire doc, it could be cacheable sudheerv 8:17 jpeach: that makes sense Amaryllis 8:17 amc: yes, but people disliked that, so instead i suggested fixing it to check for chucked encoding 8:17 amc: which doesn't need an additional config setting sudheerv 8:17 but, i guess the complication is that, at what ppint to do you know that you received the whole doc amc 8:17 OK, I must be reading the wrong PR. sudheerv 8:17 is it before checking allow empty doc or after Amaryllis 8:18 amc: there's only one PR, i didn't write the chunked version yet sudheerv 8:18 if it's just a matter of changing the order to check for whole doc, without breaking anything , then that seems like a cleaner solution amc 8:18 Ah, OK. 8:19 Yes, I agree. I think we're all on the same page - check for getting the whole document and if it's zero length and the config value is 1, cache it. Amaryllis 8:19 (we need something for 6.0 fairly quickly, so we'll be using this patch in production, but i'm happy to put more effort into fixing it properly) sudheerv 8:19 +1 amc 8:19 The brokeness is that complete chunked stuff isn't being checked correctly. 8:19 Amaryllis - coool. Amaryllis 8:19 yes - that's the only bit that actually needs to be fixed, but it might also make sense to refactor how the check is done amc 8:20 Allright. Amaryllis 8:20 i'm not sure i know enough about the TS internals to actually implement the second bit though amc 8:20 I have to go get my Macbook fixed - update took out my wireless. 8:21 I'll see if I have some time this weekend to take a look. sudheerv 8:21 Amaryllis: if it's okay with everyone else, adding the check for TE header just to fix the TE part alone for now is fine by me as well 8:21 (we can open a separate jira to improve the code for future) 8:22 (was just saying, that instead of adding case by case, it'd be nicer to have a cleaner solution - but, by no means, it's immediately required) Amaryllis 8:23 sudheerv: sure jpeach 8:23 found it ... HttpSM::is_http_server_eos_truncation 8:23 dustywusty_ is now known as dustywusty sudheerv 8:23 jpeach: that might be a bit late? 8:24 it seems to be after tunnel completes jpeach 8:24 the place it is called from might be too late, but that's the logic for knowing if we have the whole response sudheerv 8:24 whereas, the tunnel is writing to UA and Cache in parallel zwoop 8:24 sorry, had to reboot. sudheerv 8:24 jpeach: right, but, the point is zwoop 8:24 amc sudheerv I'm thinking, maybe we should just deprecate this setting completely ? And always allow caching empty docs when we safely can ? sudheerv 8:24 you may get an EOS after some body is recieved already 8:25 zwoop: +1 to that zwoop 8:25 it was added for compatibility reasons eons ago, but we changed the default to "1" in 5.x I think sudheerv 8:25 we just need to know when is it safe jpeach 8:25 sudheerv: I'm just saying that we have logic to know whether we have the whole response; not that the logic is already used in all the right places zwoop 8:25 we can deprecate it now (marking it as "don't change / use this unless you really, really have to" and remove it for 7.0.0 sudheerv 8:27 +1 8:27 reveller left the room. zwoop 8:28 sudheerv file a Jira on it maybe ? Or two. We should change the docs now (marking it deprecated) and a bug for 7.0.0 for removal sudheerv 8:29 sure.. 8:30
Attachments
Issue Links
- is related to
-
TS-3978 Allow empty document caching to follow normal logic
- Open
-
TS-621 writing 0 bytes to the HTTP cache means only update the header... need a new API: update_header_only() to allow 0 byte files to be cached
- Closed
- relates to
-
TS-3980 Investigate and remove the empty doc cache setting
- Closed