We had a pair of ATS 3.2.0 boxes that stopped passing traffic simultaneously. Here are the traffic.out msgs we saw on both boxes:
Those messages went on for a couple minutes, then traffic apparently ceased - our monitoring system saw connection refused for port 80 on ATS from then on. The connection refused state went on for many hours until ATS was restarted. There were no traffic_cop msgs in /var/log/messages indicating that the heartbeat failed.
Here are the relevant ATS settings/stats:
We previously came up with proxy.config.cache.min_average_object_size by waiting for the cache to fill and dividing proxy.process.cache.bytes_used by proxy.process.cache.direntries.used - which equals about 34KB.
We're assuming ATS ran out of direntries and it didn't handle this situation gracefully. As a possible workaround, I'm going to lower proxy.config.cache.min_average_object_size to 24KB.
Thanks to Bryan Call for helping me troubleshoot this!