[MESOS-2507] Performance issue in the master when a large number of slaves are registering. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.23.0
Component/s: master
Labels:
- scalability
- twitter

Target Version/s:

0.23.0
Sprint:
Twitter Q2 Sprint 3 - 5/11
Story Points:
5

Description

For large clusters, when a lot of slaves are registering, the master gets backlogged processing registration requests. perf revealed the following:

Events: 14K cycles
 25.44%  libmesos-0.22.0-x.so  [.] mesos::internal::master::Master::registerSlave(process::UPID const&, mesos::SlaveInfo const&, std::vector<mesos::Resource, std::allocator<mesos::Resource> > cons
 11.18%  libmesos-0.22.0-x.so  [.] pipecb
  5.88%  libc-2.5.so             [.] malloc_consolidate
  5.33%  libc-2.5.so             [.] _int_free
  5.25%  libc-2.5.so             [.] malloc
  5.23%  libc-2.5.so             [.] _int_malloc
  4.11%  libstdc++.so.6.0.8      [.] std::string::assign(std::string const&)
  3.22%  libmesos-0.22.0-x.so  [.] mesos::Resource::SharedDtor()
  3.10%  [kernel]                [k] _raw_spin_lock
  1.97%  libmesos-0.22.0-x.so  [.] mesos::Attribute::SharedDtor()
  1.28%  libc-2.5.so             [.] memcmp
  1.08%  libc-2.5.so             [.] free

This is likely because we loop over all the slaves for each registration:

void Master::registerSlave(
    const UPID& from,
    const SlaveInfo& slaveInfo,
    const vector<Resource>& checkpointedResources,
    const string& version)
{
  // ...

  // Check if this slave is already registered (because it retries).
  foreachvalue (Slave* slave, slaves.registered) {
    if (slave->pid == from) {
      // ...
    }
  }
  // ...
}

Attachments

Activity

People

Assignee:: Benjamin Mahler

Reporter:: Benjamin Mahler

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 17/Mar/15 01:13

Updated:: 19/May/15 19:25

Resolved:: 19/May/15 19:25