Uploaded image for project: 'Traffic Server'
  1. Traffic Server
  2. TS-3979

Deprecate allow empty doc cache setting.

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 6.0.0
    • 6.2.0
    • Cache
    • None

    Description

      Refer TS-3978, TS-621

      Following a discussion on the IRC, opening this jira to deprecate the Allow empty doc cache setting. The consensus is to cache anything that's cacheable and valid (regardless, of if it's empty or non-empty) - Just to clarify, it is not clear there's a way to ensure that it's safe to cache anything right now. TS-621 tried to cache empty docs in the past but ran into many difficulties, so, opening TS-3980 to investigate how Http layer (or other higher layers) may safely indicate the CacheVC that it's safe to cache.

      Amaryllis
      7:50
      
      sudheerv: do any other reverse proxies refuse to cache empty documents without content-length?  it's not something i've noticed before (in varnish, for example)
      Amaryllis
      7:50
      
      the connection could still be broken at any point in the response and result in broken content being cached
      7:51
      
      but as a compromise, what if there was another option to only cache empty responses for 3xx, which is probably 80% of use cases?
      7:52
      
      (maybe, in addition, it should accept transfer-encoding: chunked, since that can indicate end-of-body)
      zwoop_
      7:54
      
      Amaryllis  we should talk to amc (he knows everything, we don't need Google here) about this
      7:54
      
      zwoop_ is now known as woop
      7:54
      
      woop is now known as zwoop
      7:54
      
      zwoop left the room (quit: Changing host).
      7:54
      
      zwoop [~zwoop@apache/committer/zwoop] entered the room.
      7:54
      
      mode (+o zwoop) by ChanServ
      zwoop
      7:55
      
      Amaryllis amc Maybe we could have a plugin API, that basically forces to set f.allow_empty_doc ?
      amc
      7:55
      
      I think you can change the CacheVC structure without problem.
      Amaryllis
      7:55
      
      i think some sort of fix for this should be in core, at the moment TS won't cache any of our redirects
      amc
      7:56
      
      There's not a global config to allow that?
      zwoop
      7:56
      
      Amaryllis  does it send a Cache-Control header with the redirects ?
      Amaryllis
      7:56
      
      amc, zwoop: the origin sends transfer-encoding: chunked instead of content-length.
      7:56
      
      so even with cache-control it won't be cached, that's what this PR is to change
      7:56
      
      TS-3978
      ASFBot
      7:56
      
      TS-3978: Allow empty document caching to follow normal logic - https://issues.apache.org/jira/browse/TS-3978
      zwoop
      7:56
      
      ah
      7:57
      
      yeah, that's expected
      7:57
      
      and is why we added that option to allow empty docs to be cached
      Amaryllis
      7:57
      
      this is with that option enabled
      zwoop
      7:57
      
      huh
      Amaryllis
      7:57
      
      it still won't cache any document without content-length
      zwoop
      7:57
      
      yeah, need CL: too
      7:57
      
      that's a requirement
      Amaryllis
      7:57
      
      so i'll update the PR to also recognise chunked responses as being cacheable even if empty
      zwoop
      7:57
      
      otherwise, it can't distinguish the cases you were worried about (a broken connection) vs a truly empty doc
      Amaryllis
      7:58
      
      zwoop: right, but the PR makes it optional, only if allow_empty_doc=2
      zwoop
      7:58
      
      heh, your redirect is Chunked ??
      Amaryllis
      7:58
      
      yes, an empty chunked response 
      zwoop
      7:58
      
      is that even correct?
      Amaryllis
      7:58
      
      it's perfectly valid, if somewhat unusual
      zwoop
      7:58
      
      doesn't the chunking require the final "0" ?
      7:58
      
      which means, the doc isn't empty
      Amaryllis
      7:58
      
      i think ATS is looking at 'empty' after de-chucking, isn't it?
      7:59
      
      because it's definitely not cacheing these responses
      zwoop
      7:59
      
      not sure
      7:59
      
      gancho_ left the room (quit: Ping timeout: 272 seconds).
      amc
      7:59
      
      I think Amaryllis is correct.
      zwoop
      7:59
      
      Amaryllis  but, I definitely remember that the CL: header was a requirement that can not be ignored (safely), so not sure I think having an allow_empty_doc=2 is ok
      8:00
      
      unless allow_empty_doc=2 also means allow Chunked content to be empty
      amc
      8:00
      
      I need to check to see what ATS actually caches - the chunked data or the payload.
      zwoop
      8:00
      
      but, you have to have some indicator telling us that the doc really is empty before we can cache it
      8:00
      
      amc is caches the unchunked data
      8:00
      
      it dechunks it, and caches that
      8:01
      
      is dechunk even a proper verb? 
      amc
      8:01
      
      Ah, but it caches the encoding header.
      Amaryllis
      8:01
      
      zwoop: https://dpaste.de/6kUT
      zwoop
      8:01
      
      amc Hmmm, really ? That doesn't sound right
      amc
      8:01
      
      "zwoop drank too much at the summit and dechunked himself in the hallway". Yep, a proper verb.
      Amaryllis
      8:01
      
      that's the actual origin response causing problems for us
      zwoop
      8:01
      
      amc I'd expect it to change it from chunked to a Content-Length:
      amc
      8:02
      
      I was wondering about that.
      zwoop
      8:02
      
      amc lets dechunk big time at the Bar camp
      amc
      8:02
      
      What happens when the content is served? Is the encoding preserved and obeyed?
      zwoop
      8:03
      
      for the first client (the cache miss) I think it seems the chunked response
      Amaryllis
      8:03
      
      yes, TS returns a correct thunked response
      8:03
      
      and never caches it, so it's always the same
      zwoop
      8:03
      
      but subsequent requests (when served out of cache) should have a CL:
      amc
      8:03
      
      I was thinking of what happens for chunked responses from origin that are non-zero lenght.
      zwoop
      8:03
      
      Amaryllis  can you not convince your origin that they are crazy, and change it to not send a Chunked empty response, and instead send CL: 0 ?
      sudheerv
      8:04
      
      catching up on the messages…but, my concern is the same as zwoop's
      Amaryllis
      8:04
      
      zwoop: no, it never adds a CL: https://dpaste.de/Goex
      zwoop
      8:04
      
      Amaryllis  because youa re not caching it
      Amaryllis
      8:04
      
      yes, exactly, because it's not cacheable 
      zwoop
      8:04
      
      Amaryllis  amc is asking the general question, to which my answer is
      Amaryllis
      8:04
      
      ah
      zwoop
      8:04
      
      amc yes, I'd expect it to send the chunked response to the first client, and subsequent responses (out of cache) with a CL: 0
      Amaryllis
      8:04
      
      perhaps i can make uwsgi do some sort of output buffering and put a CL header in
      zwoop
      8:05
      
      amc that's at least how it's worked for the last ~10 years, we better not ahve changed it
      Amaryllis
      8:05
      
      zwoop: how can that ever happen?  a chunked empty response will never be cached
      zwoop
      8:05
      
      not talking empty responses
      8:05
      
      general case
      amc
      8:05
      
      I have many opinions on this. I do think it should be cacheable as an empty doc if it's chunked with a zero length payload.
      Amaryllis
      8:05
      
      okay, but it's not CL: 0 then
      zwoop
      8:05
      
      your case is deranged
      8:05
      
      Amaryllis  right, nothing to do with your case
      Amaryllis
      8:05
      
      amc: shall i submit a separate PR to fix that?
      amc
      8:06
      
      Yes.
      zwoop
      8:06
      
      amc yeah, that seems reasonable. But, If it was me, I'd still beat the Origin people over the head, cause they are crazy
      sudheerv
      8:06
      
      amc: yes, but, you have to ensure that you received the final chunk
      8:06
      
      if it already is the case, by the time you are checking for allow empty doc caching, then that's fine
      amc
      8:06
      
      Right. That is happening in the clips Amaryllis showed us.
      Amaryllis
      8:06
      
      zwoop: i think it's because the origin sends the headers before it's generated the response body.  so it doesn't know it's going to be empty
      sudheerv
      8:06
      
      but, i dont know if that's the case already
      zwoop
      8:07
      
      Amaryllis  it would know it's a redirect no?
      8:07
      
      Amaryllis and why would a redirect have a body, ever ?
      Amaryllis
      8:07
      
      yes, but redirects can still have bodies
      sudheerv
      8:07
      
      you may get empy responses without content-length when we support outbound spdy/h2 as well
      zwoop
      8:07
      
      true
      Amaryllis
      8:07
      
      most web servers include bodies in 3xx
      8:07
      
      i did have the origin fix it in one specific case but we need something more general
      zwoop
      8:07
      
      yeah, I forgot about that, you're right
      8:07
      
      niq left the room (quit: Ping timeout: 250 seconds).
      amc
      8:08
      
      If it's chunked and ATS recieves the final payload (0 length frame) it can reasonably presume it got valid (non-truncated) content.
      sudheerv
      8:08
      
      Amaryllis: I think the patch should ensure it can cover non-broken empty body cases for chunked-encoding/spdy/h2
      Amaryllis
      8:08
      
      sudheerv: does this even apply to spdy/h2?
      Amaryllis
      8:09
      
      actually, there is no H2 to origins at all...
      sudheerv
      8:09
      
      not right now - we don't support outbound spdy/h2 yet
      Amaryllis
      8:09
      
      right
      sudheerv
      8:09
      
      but, when we do, it will be a similar problem there
      Amaryllis
      8:09
      
      yes
      sudheerv
      8:09
      
      there's no chunked encoding or content-len with spdy/h2
      Amaryllis
      8:09
      
      but i can't really test code that's yet to be written 
      sudheerv
      8:09
      
      so, yet another case of empty body 
      8:09
      
      no, my point is - the code we write now must not break when we have spdy/h2
      8:10
      
      so, instead of "assuming" that the body is valid, it might make sense
      8:10
      
      to somehow indicate that it is
      Amaryllis
      8:10
      
      well, i think it will break anyway, in the sense that empty H2 responses won't be cached... since they have no CL?
      sudheerv
      8:10
      
      well, but you are fixing that 
      Amaryllis
      8:10
      
      but H2 doesn't do TE either, does it?  i thought it replaces that entire thing
      sudheerv
      8:10
      
      if it's consistently broken (or not supported), that's alright
      8:10
      
      yes, it doesn't
      8:10
      
      that's my point
      Amaryllis
      8:11
      
      all i'm adding (for now) is to make it accept CL _or_ TE
      sudheerv
      8:11
      
      so, we need to have a generic way of notifying cache layer
      8:11
      
      that the response is valid
      Amaryllis
      8:11
      
      and test that empty TE is properly checked for the empty chunk
      amc
      8:12
      
      Hmmm. Maybe a virtual on CacheVConnection "setAllowEmptyContent(bool)".
      sudheerv
      8:12
      
      yeah, something like that
      amc
      8:12
      
      Then an implementation in CacheVC.
      sudheerv
      8:12
      
      amc: i also want to make this setting overridable
      amc
      8:12
      
      Then the higher layer can say "I know this was a valid zero length content, cache monkey"
      sudheerv
      8:12
      
      (but, that's for later)
      8:12
      
      yes, that makes sense
      Amaryllis
      8:13
      
      amc: CacheVC already has allow_empty_doc flag, i think that's enough
      sudheerv
      8:13
      
      Amaryllis: yeah, it does - I poked around that a bit recently, didn't seem very straightfwd
      amc
      8:13
      
      Just bang on that directly?
      sudheerv
      8:13
      
      but, pls go ahead and do it if you find it easy 
      Amaryllis
      8:13
      
      amc: that was the plan, yes 
      8:14
      
      amc: https://github.com/apache/trafficserver/pull/310/files#diff-7fada9a23fa1d12e90ca6bec876ecf8fL477
      8:14
      
      amc: ... that check just needs another check for chunked as well.
      sudheerv
      8:15
      
      Amaryllis: if we did something like what amc suggested above
      amc
      8:15
      
      No, I meant that it would be set from (for instance) the chunked decoder or just above that.
      sudheerv
      8:15
      
      where the higher layer can notify cache that it can be cached
      8:15
      
      then we don't have to check any headers
      8:15
      
      that already seems hacky enough
      amc
      8:15
      
      So there wouldn't be an addiitional value for the config, it would just work as epexted.
      8:15
      
      expected.
      sudheerv
      8:15
      
      yes
      jpeach
      8:15
      
      there's already a function that checks whether the whole document was received
      amc
      8:16
      
      Hmmm. Interesting.
      Amaryllis
      8:16
      
      i think some things are getting mixed up here - i'm not going to add an additional config value
      8:16
      
      that's what the rejected PR did
      jpeach
      8:16
      
      it is used when figuring what to do when getting an EOF
      amc
      8:16
      
      What I would want is a way for the chunk decoder or its parent logic to pass on to the CacheVC the validity of the document.
      8:17
      
      Amaryllis-  I thought your PR added the value '2' to the config variable.
      jpeach
      8:17
      
      afaict as long as you can know you have the entire doc, it could be cacheable
      sudheerv
      8:17
      
      jpeach: that makes sense
      Amaryllis
      8:17
      
      amc: yes, but people disliked that, so instead i suggested fixing it to check for chucked encoding
      8:17
      
      amc: which doesn't need an additional config setting
      sudheerv
      8:17
      
      but, i guess the complication is that, at what ppint to do you know that you received the whole doc
      amc
      8:17
      
      OK, I must be reading the wrong PR.
      sudheerv
      8:17
      
      is it before checking allow empty doc or after
      Amaryllis
      8:18
      
      amc: there's only one PR, i didn't write the chunked version yet
      sudheerv
      8:18
      
      if it's just a matter of changing the order to check for whole doc, without breaking anything , then that seems like a cleaner solution
      amc
      8:18
      
      Ah, OK.
      8:19
      
      Yes, I agree. I think we're all on the same page - check for getting the whole document and if it's zero length and the config value is 1, cache it.
      Amaryllis
      8:19
      
      (we need something for 6.0 fairly quickly, so we'll be using this patch in production, but i'm happy to put more effort into fixing it properly)
      sudheerv
      8:19
      
      +1
      amc
      8:19
      
      The brokeness is that complete chunked stuff isn't being checked correctly.
      8:19
      
      Amaryllis - coool.
      Amaryllis
      8:19
      
      yes - that's the only bit that actually needs to be fixed, but it might also make sense to refactor how the check is done
      amc
      8:20
      
      Allright.
      Amaryllis
      8:20
      
      i'm not sure i know enough about the TS internals to actually implement the second bit though
      amc
      8:20
      
      I have to go get my Macbook fixed - update took out my wireless.
      8:21
      
      I'll see if I have some time this weekend to take a look.
      sudheerv
      8:21
      
      Amaryllis: if it's okay with everyone else, adding the check for TE header just to fix the TE part alone for now is fine by me as well
      8:21
      
      (we can open a separate jira to improve the code for future)
      8:22
      
      (was just saying, that instead of adding case by case, it'd be nicer to have a cleaner solution - but, by no means, it's immediately required)
      Amaryllis
      8:23
      
      sudheerv: sure
      jpeach
      8:23
      
      found it ... HttpSM::is_http_server_eos_truncation
      8:23
      
      dustywusty_ is now known as dustywusty
      sudheerv
      8:23
      
      jpeach: that might be a bit late?
      8:24
      
      it seems to be after tunnel completes
      jpeach
      8:24
      
      the place it is called from might be too late, but that's the logic for knowing if we have the whole response
      sudheerv
      8:24
      
      whereas, the tunnel is writing to UA and Cache in parallel
      zwoop
      8:24
      
      sorry, had to reboot.
      sudheerv
      8:24
      
      jpeach: right, but, the point is
      zwoop
      8:24
      
      amc sudheerv  I'm thinking, maybe we should just deprecate this setting completely ? And always allow caching empty docs when we safely can ?
      sudheerv
      8:24
      
      you may get an EOS after some body is recieved already
      8:25
      
      zwoop: +1 to that
      zwoop
      8:25
      
      it was added for compatibility reasons eons ago, but we changed the default to "1"  in 5.x I think
      sudheerv
      8:25
      
      we just need to know when is it safe 
      jpeach
      8:25
      
      sudheerv: I'm just saying that we have logic to know whether we have the whole response; not that the logic is already used in all the right places
      zwoop
      8:25
      
      we can deprecate it now (marking it as "don't change / use this unless you really, really have to" and remove it for 7.0.0
      sudheerv
      8:27
      
      +1
      8:27
      
      reveller left the room.
      zwoop
      8:28
      
      sudheerv  file a Jira on it maybe ? Or two. We should change the docs now (marking it deprecated) and a bug for 7.0.0 for removal
      sudheerv
      8:29
      
      sure..
      8:30

      Attachments

        Issue Links

          Activity

            People

              sudheerv Sudheer Vinukonda
              sudheerv Sudheer Vinukonda
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: