Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
0.20.0
-
Twitter Mesos Q1 Sprint 2, Twitter Mesos Q1 Sprint 3, Twitter Mesos Q1 Sprint 4, Twitter Mesos Q1 Sprint 5
-
2
Description
The CFS cpu statistics (cpus_nr_throttled, cpus_nr_periods, cpus_throttled_time) are difficult to interpret.
1) nr_throttled is the number of intervals where any throttling occurred
2) throttled_time is the aggregate time across all runnable tasks (tasks in the Linux sense).
For example, in a typical 60 second sampling interval: nr_periods = 600, nr_throttled could be 60, i.e., 10% of intervals, but throttled_time could be much higher than (60/600) * 60 = 6 seconds if there is more than one task that is runnable but throttled. Each throttled task contributes to the total throttled time.
Small test to demonstrate throttled_time > nr_periods * quota_interval:
5 x 'openssl speed' running with quota=100ms:
cat cpu.stat && sleep 1 && cat cpu.stat nr_periods 3228 nr_throttled 1276 throttled_time 528843772540 nr_periods 3238 nr_throttled 1286 throttled_time 531668964667
All 10 intervals throttled (100%) for total time of 2.8 seconds in 1 second ("more than 100%" of the time interval)
It would be helpful to expose the number of processes and tasks in the container cgroup. This would be at a very coarse granularity but would give some guidance.
Attachments
Issue Links
- relates to
-
MESOS-2365 Expose the states of processes and threads in a container
- Open