Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-1862

Performance regression in the Master's http metrics.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • 0.21.0
    • 0.21.0
    • master
    • None
    • Twitter Q4 Sprint 1
    • 3

    Description

      As part of the change to hold on to terminal unacknowledged tasks in the master, we introduced a performance regression during the following patch:

      https://github.com/apache/mesos/commit/0760b007ad65bc91e8cea377339978c78d36d247

      commit 0760b007ad65bc91e8cea377339978c78d36d247
      Author: Benjamin Mahler <bmahler@twitter.com>
      Date:   Thu Sep 11 10:48:20 2014 -0700
      
          Minor cleanups to the Master code.
      
          Review: https://reviews.apache.org/r/25566
      

      Rather than keeping a running count of allocated resources, we now compute resources on-demand. This was done in order to ignore terminal task's resources.

      As a result of this change, the /stats.json and /metrics/snapshot endpoints on the master have slowed down substantially on large clusters.

      $ time curl localhost:5050/health
      real	0m0.004s
      user	0m0.001s
      sys	0m0.002s
      
      $ time curl localhost:5050/stats.json > /dev/null
      real	0m15.402s
      user	0m0.001s
      sys	0m0.003s
      
      $ time curl localhost:5050/metrics/snapshot > /dev/null
      real	0m6.059s
      user	0m0.002s
      sys	0m0.002s
      

      perf top reveals some of the resource computation during a request to stats.json:

      Events: 36K cycles
       10.53%  libc-2.5.so             [.] _int_free
        9.90%  libc-2.5.so             [.] malloc
        8.56%  libmesos-0.21.0.so  [.] std::_Rb_tree<process::ProcessBase*, process::ProcessBase*, std::_Identity<process::ProcessBase*>, std::less<process::ProcessBase*>, std::allocator<process::ProcessBase*> >::
        8.23%  libc-2.5.so             [.] _int_malloc
        5.80%  libstdc++.so.6.0.8      [.] std::_Rb_tree_increment(std::_Rb_tree_node_base*)
        5.33%  [kernel]                [k] _raw_spin_lock
        3.13%  libstdc++.so.6.0.8      [.] std::string::assign(std::string const&)
        2.95%  libmesos-0.21.0.so  [.] process::SocketManager::exited(process::ProcessBase*)
        2.43%  libmesos-0.21.0.so  [.] mesos::Resource::MergeFrom(mesos::Resource const&)
        1.88%  libmesos-0.21.0.so  [.] mesos::internal::master::Slave::used() const
        1.48%  libstdc++.so.6.0.8      [.] __gnu_cxx::__atomic_add(int volatile*, int)
        1.45%  [kernel]                [k] find_busiest_group
        1.41%  libc-2.5.so             [.] free
        1.38%  libmesos-0.21.0.so  [.] mesos::Value_Range::MergeFrom(mesos::Value_Range const&)
        1.13%  libmesos-0.21.0.so  [.] mesos::Value_Scalar::MergeFrom(mesos::Value_Scalar const&)
        1.12%  libmesos-0.21.0.so  [.] mesos::Resource::SharedDtor()
        1.07%  libstdc++.so.6.0.8      [.] __gnu_cxx::__exchange_and_add(int volatile*, int)
        0.94%  libmesos-0.21.0.so  [.] google::protobuf::UnknownFieldSet::MergeFrom(google::protobuf::UnknownFieldSet const&)
        0.92%  libstdc++.so.6.0.8      [.] operator new(unsigned long)
        0.88%  libmesos-0.21.0.so  [.] mesos::Value_Ranges::MergeFrom(mesos::Value_Ranges const&)
        0.75%  libmesos-0.21.0.so  [.] mesos::matches(mesos::Resource const&, mesos::Resource const&)
      

      Attachments

        Activity

          People

            bmahler Benjamin Mahler
            bmahler Benjamin Mahler
            Vinod Kone Vinod Kone
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: