Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-6420

Mesos Agent leaking sockets when port mapping network isolator is ON

    XMLWordPrintableJSON

Details

    Description

      Mesos Agent leaks one socket per task launched and eventually runs out of sockets. We were able to track it down to the network isolator (port_mapping.cpp). When we turned off the port mapping isolator no file descriptors where leaked. The leaked fd is a SOCK_STREAM socket.

      Leaked Sockets:
      $ sudo lsof -p $(pgrep -u root -o -f /usr/local/sbin/mesos-slave) -nP | grep "can't"
      [sudo] password for sshanmugham:
      mesos-sla 57688 root 19u sock 0,6 0t0 2993216948 can't identify protocol
      mesos-sla 57688 root 27u sock 0,6 0t0 2993216468 can't identify protocol

      Extract from strace:

      ...
      [pid 57701] 19:14:02.493718 socket(PF_NETLINK, SOCK_RAW, 0) = 19
      [pid 57701] 19:14:02.494395 close(19) = 0
      [pid 57701] 19:14:02.494448 socket(PF_NETLINK, SOCK_RAW, 0) = 19
      [pid 57701] 19:14:02.494844 close(19) = 0
      [pid 57701] 19:14:02.494913 socket(PF_NETLINK, SOCK_RAW, 0) = 19
      [pid 57701] 19:14:02.495565 close(19) = 0
      [pid 57701] 19:14:02.495617 socket(PF_NETLINK, SOCK_RAW, 0) = 19
      [pid 57701] 19:14:02.496072 close(19) = 0
      [pid 57701] 19:14:02.496128 socket(PF_NETLINK, SOCK_RAW, 0) = 19
      [pid 57701] 19:14:02.496758 close(19) = 0
      [pid 57701] 19:14:02.496812 socket(PF_NETLINK, SOCK_RAW, 0) = 19
      [pid 57701] 19:14:02.497270 close(19) = 0
      [pid 57701] 19:14:02.497319 socket(PF_NETLINK, SOCK_RAW, 0) = 19
      [pid 57701] 19:14:02.497698 close(19) = 0
      [pid 57701] 19:14:02.497750 socket(PF_NETLINK, SOCK_RAW, 0) = 19
      [pid 57701] 19:14:02.498407 close(19) = 0
      [pid 57701] 19:14:02.498456 socket(PF_NETLINK, SOCK_RAW, 0) = 19
      [pid 57701] 19:14:02.498899 close(19) = 0
      [pid 57701] 19:14:02.498963 socket(PF_NETLINK, SOCK_RAW, 0) = 19
      [pid 63682] 19:14:02.499091 close(18 <unfinished ...>
      [pid 57701] 19:14:02.499634 close(19) = 0
      [pid 57701] 19:14:02.499689 socket(PF_NETLINK, SOCK_RAW, 0) = 19
      [pid 57701] 19:14:02.500044 close(19) = 0
      [pid 57701] 19:14:02.500093 socket(PF_NETLINK, SOCK_RAW, 0) = 19
      [pid 57701] 19:14:02.500734 close(19) = 0
      [pid 57701] 19:14:02.500782 socket(PF_NETLINK, SOCK_RAW, 0) = 19
      [pid 57701] 19:14:02.501271 close(19) = 0
      [pid 57701] 19:14:02.501339 socket(PF_NETLINK, SOCK_RAW, 0) = 19
      [pid 57701] 19:14:02.502030 close(19) = 0
      [pid 57701] 19:14:02.502101 socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 19
      ...

      ...
      [pid 57691] 19:18:03.461022 socket(PF_NETLINK, SOCK_RAW, 0) = 27
      [pid 6138] 19:18:03.461345 open("/etc/selinux/config", O_RDONLY <unfinished ...>
      [pid 57691] 19:18:03.461460 close(27) = 0
      [pid 57691] 19:18:03.461520 socket(PF_NETLINK, SOCK_RAW, 0) = 27
      [pid 6138] 19:18:03.461632 close(3 <unfinished ...>
      [pid 6138] 19:18:03.461781 open("/proc/mounts", O_RDONLY <unfinished ...>
      [pid 6138] 19:18:03.462190 close(3 <unfinished ...>
      [pid 57691] 19:18:03.462374 close(27) = 0
      [pid 57691] 19:18:03.462430 socket(PF_NETLINK, SOCK_RAW, 0 <unfinished ...>
      [pid 6138] 19:18:03.462456 open("/proc/net/psched", O_RDONLY <unfinished ...>
      [pid 6138] 19:18:03.462678 close(3 <unfinished ...>
      [pid 6138] 19:18:03.462915 open("/etc/libnl/classid", O_RDONLY <unfinished ...>
      [pid 57691] 19:18:03.463046 close(27) = 0
      [pid 57691] 19:18:03.463111 socket(PF_NETLINK, SOCK_RAW, 0) = 27
      [pid 6138] 19:18:03.463225 close(3 <unfinished ...>
      [pid 57691] 19:18:03.463845 close(27) = 0
      [pid 57691] 19:18:03.463911 socket(PF_NETLINK, SOCK_RAW, 0) = 27
      [pid 57691] 19:18:03.464604 close(27) = 0
      [pid 57691] 19:18:03.464664 socket(PF_NETLINK, SOCK_RAW, 0) = 27
      [pid 57691] 19:18:03.465074 close(27) = 0
      [pid 57691] 19:18:03.465132 socket(PF_NETLINK, SOCK_RAW, 0) = 27
      [pid 57691] 19:18:03.465862 close(27) = 0
      [pid 57691] 19:18:03.465928 socket(PF_NETLINK, SOCK_RAW, 0) = 27
      [pid 57691] 19:18:03.466713 close(27) = 0
      [pid 57691] 19:18:03.466780 socket(PF_NETLINK, SOCK_RAW, 0) = 27
      [pid 57691] 19:18:03.467472 close(27) = 0
      [pid 57691] 19:18:03.467524 socket(PF_NETLINK, SOCK_RAW, 0) = 27
      [pid 57691] 19:18:03.468012 close(27) = 0
      [pid 57691] 19:18:03.468075 socket(PF_NETLINK, SOCK_RAW, 0) = 27
      [pid 57691] 19:18:03.468799 close(27) = 0
      [pid 57691] 19:18:03.468950 socket(PF_NETLINK, SOCK_RAW, 0) = 27
      [pid 57691] 19:18:03.469505 close(27) = 0
      [pid 57691] 19:18:03.469578 socket(PF_NETLINK, SOCK_RAW, 0) = 27
      [pid 57691] 19:18:03.470301 close(27) = 0
      [pid 57691] 19:18:03.470353 socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 27
      ...

      The last socket the was created never has a corresponding close().

      Attachments

        Activity

          People

            Unassigned Unassigned
            santhoshkumar.s@gmail.com Santhosh Shanmugham
            Jie Yu Jie Yu
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: