Uploaded image for project: 'Traffic Server'
  1. Traffic Server
  2. TS-3395

Hit ratio drops with high concurrency



    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: 7.1.0
    • Component/s: Cache
    • Labels:


      I'm doing some tests and I've noticed that the hit ratio drops with more than 300 simultaneous http connections.

      The cache is on a raw disk of 500gb and it's not filled, so no eviction. The ram cache is disabled.

      The test is done with web-polygraph. Content size vary from 5kb to 20kb uniformly, expected hit ratio 60%, 2000 http connections, documents expire after months. There's no Vary.

      Then I thought it could be a problem of polygraph. I wrote my own client/server test code, it works fine also with squid, varnish and nginx. I register a hit if I get either cR or cH in the headers.

      2015/02/19 12:38:28 Starting 1000000 requests
      2015/02/19 12:37:58 Elapsed: 3m51.23552164s
      2015/02/19 12:37:58 Total average: 231.235┬Ás/req, 4324.60req/s
      2015/02/19 12:37:58 Average size: 12.50kb/req
      2015/02/19 12:37:58 Bytes read: 12498412.45kb, 54050.57kb/s
      2015/02/19 12:37:58 Errors: 0
      2015/02/19 12:37:58 Offered Hit ratio: 59.95%
      2015/02/19 12:37:58 Measured Hit ratio: 37.20%
      2015/02/19 12:37:58 Hit bytes: 4649000609
      2015/02/19 12:37:58 Hit success: 599476/599476 (100.00%), 469.840902ms/req
      2015/02/19 12:37:58 Miss success: 400524/400524 (100.00%), 336.301464ms/req

      So similar results, 37.20% on average. Then I thought that could be a problem of how I'm testing stuff, and tried with nginx cache. It achieves 60% hit ratio, but request rate is very slow compared to ATS for obvious reasons.

      Then I wanted to check if with 200 connections but with longer test time hit ratio also dropped, but no, it's fine:

      So not a problem of my tests I guess.

      Then I realized by debugging the test server that the same url was asked twice.
      Out of 1000000 requests, 78600 urls were asked at least twice. An url was even requested 9 times. These same url are not requested close to each other: even more than 30sec can pass from one request to the other for the same url.

      I also tweaked the following parameters:

      CONFIG proxy.config.http.cache.fuzz.time INT 0
      CONFIG proxy.config.http.cache.fuzz.min_time INT 0
      CONFIG proxy.config.http.cache.fuzz.probability FLOAT 0.000000
      CONFIG proxy.config.http.cache.max_open_read_retries INT 4
      CONFIG proxy.config.http.cache.open_read_retry_time INT 500

      And this is the result with polygraph, similar results:

      Tweaked the read-while-writer option, and yet having similar results.

      Then I've enabled 1GB of ram, it is slightly better at the beginning, but then it drops:

      traffic_top says 25% ram hit, 37% fresh, 63% cold.

      So given that it doesn't seem to be a concurrency problem when requesting the url to the origin server, could it be a problem of concurrent write access to the cache? So that some pages are not cached at all? The traffoc_top fresh percentage also makes me think it can be a problem in writing the cache.

      Not sure if I explained the problem correctly, ask me further information in case. But in summary: hit ratio drops with a high number of connections, and the problem seems related to pages that are not written to the cache.

      This is some related issue: http://mail-archives.apache.org/mod_mbox/trafficserver-users/201301.mbox/%3CCD28CB1F.1F44A%25peter.walsh@email.disney.com%3E

      Also this: http://apache-traffic-server.24303.n7.nabble.com/why-my-proxy-node-cache-hit-ratio-drops-td928.html


          Issue Links



              • Assignee:
                amc Alan M. Carroll
                lethalman Luca Bruno
              • Votes:
                0 Vote for this issue
                8 Start watching this issue


                • Created: