Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-8731

mesos master APIs become latent

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.4.0, 1.5.0
    • Fix Version/s: 1.8.0
    • Component/s: master
    • Labels:
      None

      Description

      Over a period of time one of the UI API call to the master becomes latent. Normally the request that takes less than a second takes up to 20 seconds during peak. A lot of the dev team access the UI for logs.

      Below are my observations :

      In mesos "0.28.1-2.0.20.ubuntu1404"

      ################################################################

      1. ab -n 1000 -c 10 "http://mesos-master1.mesos.bla.net:5050/metrics/snapshot?jsonp=angular.callbacks._4g"
        This is ApacheBench, Version 2.3 <$Revision: 1528965 $>
        Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
        Licensed to The Apache Software Foundation, http://www.apache.org/

      Benchmarking mesos-master1.mesos.bla.net (be patient)
      Completed 100 requests
      Completed 200 requests
      Completed 300 requests
      Completed 400 requests
      Completed 500 requests
      Completed 600 requests
      Completed 700 requests
      Completed 800 requests
      Completed 900 requests
      Completed 1000 requests
      Finished 1000 requests

      Server Software:
      Server Hostname: mesos-master1.mesos.bla.net
      Server Port: 5050

      Document Path: /metrics/snapshot?jsonp=angular.callbacks._4g
      Document Length: 3197 bytes

      Concurrency Level: 10
      Time taken for tests: 501.010 seconds
      Complete requests: 1000
      Failed requests: 954
      (Connect: 0, Receive: 0, Length: 954, Exceptions: 0)
      Total transferred: 3304510 bytes
      HTML transferred: 3195510 bytes
      Requests per second: 2.00 /sec (mean)
      Time per request: 5010.104 [ms] (mean)
      Time per request: 501.010 [ms] (mean, across all concurrent requests)
      Transfer rate: 6.44 [Kbytes/sec] received

      Connection Times (ms)
      min mean[+/-sd] median max
      Connect: 0 0 0.0 0 0
      Processing: 321 4987 286.4 5007 5508
      Waiting: 321 4987 286.4 5007 5508
      Total: 321 4988 286.4 5007 5508

      Percentage of the requests served within a certain time (ms)
      50% 5007
      66% 5007
      75% 5008
      80% 5008
      90% 5008
      95% 5009
      98% 5010
      99% 5506
      100% 5508 (longest request)

      ################################################################

       

      In mesos 1.4 and 1.5 (versions 1.4.0-2.0.1 and 1.5.0-2.0.1) the response of these APIs is quite high. 

      ################################################################

      1. ab -n 1000 -c 10 "http://mesos-master3.stage.bla.net:5050/metrics/snapshot?jsonp=angular.callbacks._4g"
        This is ApacheBench, Version 2.3 <$Revision: 1706008 $>
        Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
        Licensed to The Apache Software Foundation, http://www.apache.org/

      Benchmarking mesos-master3.stage.bla.net (be patient)
      Completed 100 requests
      Completed 200 requests
      Completed 300 requests
      Completed 400 requests
      Completed 500 requests
      ^C

      Server Software:
      Server Hostname: mesos-master3.stage.bla.net
      Server Port: 5050

      Document Path: /metrics/snapshot?jsonp=angular.callbacks._4g
      Document Length: 6596 bytes

      Concurrency Level: 10
      Time taken for tests: 1405.182 seconds
      Complete requests: 582
      Failed requests: 580
      (Connect: 0, Receive: 0, Length: 580, Exceptions: 0)
      Total transferred: 3909986 bytes
      HTML transferred: 3846548 bytes
      Requests per second: 0.41 /sec (mean)
      Time per request: 24144.024 [ms] (mean)
      Time per request: 2414.402 [ms] (mean, across all concurrent requests)
      Transfer rate: 2.72 [Kbytes/sec] received

      Connection Times (ms)
      min mean[+/-sd] median max
      Connect: 0 0 0.0 0 0
      Processing: 15284 24058 2600.7 23937 31740
      Waiting: 15284 24058 2600.7 23937 31740
      Total: 15284 24059 2600.7 23938 31740

      Percentage of the requests served within a certain time (ms)
      50% 23938
      66% 25074
      75% 25729
      80% 26465
      90% 27605
      95% 28215
      98% 29685
      99% 30595
      100% 31740 (longest request)

      ################################################################

      I think this is causing the others APIs like "/master/slaves/ and "/metrics" to become latent. 

      At this point we are forcing a re-elect of the the master to bring the times down. What can I do to bring this times down? The load on the box is quite less. The load average does not cross 2 on a 8 core box.

      Let me know if any further info is required. 

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                krishnaghatti sri krishna
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: