When ATS is used as a delivery server for a video live streaming event, it's possible that there are a huge number of concurrent requests for the same object. Depending on the type of the object being requested, the cache lookup for those objects can result in either a stale copy of the object (e.g manifest files) or a complete cache miss (e.g segment files). ATS currently supports different types of connection collapse (e.g. read-while-write functionality - https://docs.trafficserver.apache.org/en/latest/admin/http-proxy-caching.en.html#read-while-writer*, swr etc) but, in order for the *rww to kick-in, ATS requires the complete response headers for the object be received and validated. In other words, until this happens, any number of incoming requests for the same object that result in a cache miss or a cache stale would be forwarded to the origin. For a scenario such as a live event, this leaves a sufficiently significant window, where there could be 100's of requests being forwarded to the origin for the same object. It has been observed during production that this results in significant increase in latency for the objects waiting in read-while-write state.
Note that, there are also a couple of settings proxy.config.http.cache.open_read_retry_time and proxy.config.http.cache.max_open_read_retries (*https://docs.trafficserver.apache.org/en/latest/admin/http-proxy-caching.en.html#open-read-retry-timeout*) that can alleviate the thundering herd to some extent, by re-trying to get the read lock for the object as configured. With these configured, ATS would retry to get the read lock for as long and if it's still not available due to the write lock being held by the first request that was forwarded to the origin (for e.g. the response headers have not been received yet), then all the waiting requests would simply be forwarded to the origin (by disabling cache for each of them).
It is almost impossible to get the above settings accurate to help in all possible situations (traffic, concurrent connections, network conditions etc). Due to this reason, a configurable workaround is proposed below that avoids the thundering herd completely. The patch below is mainly from jlaue and psudaemon with some additional clean up, configuration control and debug headers etc.
Basically, when configured, on failing to obtain a write lock for an object (which means, there's another ongoing parallel request for the same object that was forwarded to the origin), if it's a cache refresh miss, a stale copy of the object is served, while if it's a complete cache miss, a 502 error is returned to let the client (e.g. player) to reattempt. The 502 error also includes a special internal ATS header named @ats-internal-messages with the appropriate value to allow for custom logging or for plugins to take any appropriate actions (e.g. prevent a fail-over if there's such a plugin that does fail-over on a regular 502 error).