Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-2182

Performance issue in libprocess SocketManager.

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.21.0
    • Fix Version/s: 0.21.1
    • Component/s: libprocess
    • Labels:
      None
    • Sprint:
      Twitter Mesos Q4 Sprint 5
    • Story Points:
      3

      Description

      Noticed an issue in production under which the master is slow to respond after failover for ~15 minutes.

      After looking at some perf data, the top offender is:

          12.02%  mesos-master  libmesos-0.21.0-rc3.so  [.] std::_Rb_tree<process::ProcessBase*, process::ProcessBase*, std::_Identity<process::ProcessBase*>, std::less<process::ProcessBase*>, std::allocator<process::ProcessBase*> >::erase(process::ProcessBase* const&)
      ...
           3.29%  mesos-master  libmesos-0.21.0-rc3.so  [.] process::SocketManager::exited(process::ProcessBase*)
      

      It appears that in the SocketManager, whenever an internal Process exits, we loop over all the links unnecessarily:

      void SocketManager::exited(ProcessBase* process)
      {
        // An exited event is enough to cause the process to get deleted
        // (e.g., by the garbage collector), which means we can't
        // dereference process (or even use the address) after we enqueue at
        // least one exited event. Thus, we save the process pid.
        const UPID pid = process->pid;
      
        // Likewise, we need to save the current time of the process so we
        // can update the clocks of linked processes as appropriate.
        const Time time = Clock::now(process);
      
        synchronized (this) {
          // Iterate through the links, removing any links the process might
          // have had and creating exited events for any linked processes.
          foreachpair (const UPID& linkee, set<ProcessBase*>& processes, links) {
            processes.erase(process);
      
            if (linkee == pid) {
              foreach (ProcessBase* linker, processes) {
                CHECK(linker != process) << "Process linked with itself";
                synchronized (timeouts) {
                  if (Clock::paused()) {
                    Clock::update(linker, time);
                  }
                }
                linker->enqueue(new ExitedEvent(linkee));
              }
            }
          }
      
          links.erase(pid);
        }
      }
      

      On clusters with 10,000s of slaves, this means we hold the socket manager lock for a very expensive loop erasing nothing from a set! This is because, the master contains links from the Master Process to each slave. However, when a random ephemeral Process terminates, we don't need to loop over each slave link.

      While we hold this lock, the following calls will block:

      class SocketManager
      {
      public:
        Socket accepted(int s);
        void link(ProcessBase* process, const UPID& to);
        PID<HttpProxy> proxy(const Socket& socket);
        void send(Encoder* encoder, bool persist);
        void send(const Response& response,
                  const Request& request,
                  const Socket& socket);
        void send(Message* message);
        Encoder* next(int s);
        void close(int s);
        void exited(const Node& node);
        void exited(ProcessBase* process);
      ...
      

      As a result, the slave observers and the master can block calling send()!

      Short term, we will try to fix this issue by removing the unnecessary looping. Longer term, it would be nice to avoid all this locking when sending on independent sockets.

        Activity

        Hide
        jieyu Jie Yu added a comment -

        Since we call `socket_manager->exited(process)` while holding `processes` lock inside `ProcessManager::cleanup(ProcessBase* process)`, all functions that involve `processes` lock will be affected too, like process::spawn.

        Show
        jieyu Jie Yu added a comment - Since we call `socket_manager->exited(process)` while holding `processes` lock inside `ProcessManager::cleanup(ProcessBase* process)`, all functions that involve `processes` lock will be affected too, like process::spawn.
        Hide
        jieyu Jie Yu added a comment -

        Digging a little more,

        ProcessReference ProcessManager::use(const UPID& pid) 
        {
          if (pid.ip == __ip__ && pid.port == __port__) {
            synchronized (processes) {
              if (processes.count(pid.id) > 0) { 
                // Note that the ProcessReference constructor _must_ get
                // called while holding the lock on processes so that waiting
                // for references is atomic (i.e., race free).
                return ProcessReference(processes[pid.id]);
              }    
            }    
          }
        
          return ProcessReference(NULL);
        }
        

        That means every time we do `dispatch(self(), ...)`, we will end up contending on lock `processes`. So if there is a really big critical section (like the one in SocketManager::exited(..)), the performance will be hurt pretty badly.

        Show
        jieyu Jie Yu added a comment - Digging a little more, ProcessReference ProcessManager::use( const UPID& pid) { if (pid.ip == __ip__ && pid.port == __port__) { synchronized (processes) { if (processes.count(pid.id) > 0) { // Note that the ProcessReference constructor _must_ get // called while holding the lock on processes so that waiting // for references is atomic (i.e., race free). return ProcessReference(processes[pid.id]); } } } return ProcessReference(NULL); } That means every time we do `dispatch(self(), ...)`, we will end up contending on lock `processes`. So if there is a really big critical section (like the one in SocketManager::exited(..)), the performance will be hurt pretty badly.
        Hide
        jieyu Jie Yu added a comment -

        And we have plenty of ephemeral libprocess processes (that generate SocketManager::exited). For example, every replicated log write will generate an ephemeral libprocess process: `internal::log::WriteProcess in src/log/consensus.cpp`.

        Show
        jieyu Jie Yu added a comment - And we have plenty of ephemeral libprocess processes (that generate SocketManager::exited). For example, every replicated log write will generate an ephemeral libprocess process: `internal::log::WriteProcess in src/log/consensus.cpp`.
        Hide
        bmahler Benjamin Mahler added a comment -
        Show
        bmahler Benjamin Mahler added a comment - Here is a fix: https://reviews.apache.org/r/28838/
        Hide
        bmahler Benjamin Mahler added a comment -
        commit f1537180747774bbb2e02062f89614be208327d7
        Author: Benjamin Mahler <benjamin.mahler@gmail.com>
        Date:   Tue Dec 9 12:41:33 2014 -0800
        
            Added a hash function for Node.
        
            Review: https://reviews.apache.org/r/28869
        
        commit f8fc9d0a901777ecf648c0a4991b847104a63b0d
        Author: Benjamin Mahler <benjamin.mahler@gmail.com>
        Date:   Mon Dec 8 22:18:25 2014 -0800
        
            Fixed a long-standing performance issue in libprocess' SocketManager.
        
            Review: https://reviews.apache.org/r/28838
        
        Show
        bmahler Benjamin Mahler added a comment - commit f1537180747774bbb2e02062f89614be208327d7 Author: Benjamin Mahler <benjamin.mahler@gmail.com> Date: Tue Dec 9 12:41:33 2014 -0800 Added a hash function for Node. Review: https://reviews.apache.org/r/28869 commit f8fc9d0a901777ecf648c0a4991b847104a63b0d Author: Benjamin Mahler <benjamin.mahler@gmail.com> Date: Mon Dec 8 22:18:25 2014 -0800 Fixed a long-standing performance issue in libprocess' SocketManager. Review: https://reviews.apache.org/r/28838
        Hide
        bmahler Benjamin Mahler added a comment -

        Jie posted a benchmark that we used to test this:
        https://reviews.apache.org/r/28871/

        Show
        bmahler Benjamin Mahler added a comment - Jie posted a benchmark that we used to test this: https://reviews.apache.org/r/28871/
        Hide
        jieyu Jie Yu added a comment -

        commit 7ee432adf123cede5e3e471a48354ffb8ae14586
        Author: Jie Yu <yujie.jay@gmail.com>
        Date: Tue Dec 9 18:38:11 2014 -0800

        Added a benchmark to test large number of links for MESOS-2182.

        Review: https://reviews.apache.org/r/28871

        Show
        jieyu Jie Yu added a comment - commit 7ee432adf123cede5e3e471a48354ffb8ae14586 Author: Jie Yu <yujie.jay@gmail.com> Date: Tue Dec 9 18:38:11 2014 -0800 Added a benchmark to test large number of links for MESOS-2182 . Review: https://reviews.apache.org/r/28871
        Hide
        bmahler Benjamin Mahler added a comment -

        Re-opening to capture a fix for a regression here:
        https://reviews.apache.org/r/28887/

        Show
        bmahler Benjamin Mahler added a comment - Re-opening to capture a fix for a regression here: https://reviews.apache.org/r/28887/
        Hide
        bmahler Benjamin Mahler added a comment -

        Follow up fix:

        commit 7a98b3de1c26affb96f09c12d833d38dee72a325
        Author: Benjamin Mahler <benjamin.mahler@gmail.com>
        Date:   Tue Dec 9 19:14:07 2014 -0800
        
            Fixed a regression within libprocess link management.
        
            Review: https://reviews.apache.org/r/28887
        
        Show
        bmahler Benjamin Mahler added a comment - Follow up fix: commit 7a98b3de1c26affb96f09c12d833d38dee72a325 Author: Benjamin Mahler <benjamin.mahler@gmail.com> Date: Tue Dec 9 19:14:07 2014 -0800 Fixed a regression within libprocess link management. Review: https://reviews.apache.org/r/28887

          People

          • Assignee:
            bmahler Benjamin Mahler
            Reporter:
            bmahler Benjamin Mahler
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development

                Agile