I'm doing some tests and I've noticed that the hit ratio drops with more than 300 simultaneous http connections.
The cache is on a raw disk of 500gb and it's not filled, so no eviction. The ram cache is disabled.
The test is done with web-polygraph. Content size vary from 5kb to 20kb uniformly, expected hit ratio 60%, 2000 http connections, documents expire after months. There's no Vary.
Then I thought it could be a problem of polygraph. I wrote my own client/server test code, it works fine also with squid, varnish and nginx. I register a hit if I get either cR or cH in the headers.
So similar results, 37.20% on average. Then I thought that could be a problem of how I'm testing stuff, and tried with nginx cache. It achieves 60% hit ratio, but request rate is very slow compared to ATS for obvious reasons.
Then I wanted to check if with 200 connections but with longer test time hit ratio also dropped, but no, it's fine:
So not a problem of my tests I guess.
Then I realized by debugging the test server that the same url was asked twice.
Out of 1000000 requests, 78600 urls were asked at least twice. An url was even requested 9 times. These same url are not requested close to each other: even more than 30sec can pass from one request to the other for the same url.
I also tweaked the following parameters:
And this is the result with polygraph, similar results:
Tweaked the read-while-writer option, and yet having similar results.
Then I've enabled 1GB of ram, it is slightly better at the beginning, but then it drops:
traffic_top says 25% ram hit, 37% fresh, 63% cold.
So given that it doesn't seem to be a concurrency problem when requesting the url to the origin server, could it be a problem of concurrent write access to the cache? So that some pages are not cached at all? The traffoc_top fresh percentage also makes me think it can be a problem in writing the cache.
Not sure if I explained the problem correctly, ask me further information in case. But in summary: hit ratio drops with a high number of connections, and the problem seems related to pages that are not written to the cache.