Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-9127

Port mapper CNI plugin might deadlock iptables on the agent.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.4.1, 1.5.1, 1.6.1
    • 1.4.2, 1.5.2, 1.6.2, 1.7.0
    • network
    • None

    Description

      Recently, we noticed that if one launches a lot of containers that use the port mapper CNI plugin, the iptables will get deadlock. The symptom is like the following.

      If you do any iptables command on the box, it'll get stuck on acquiring the xtables lock:

      core@ip-10-0-2-99 ~ $ time iptables -w -t nat -S UCR-DEFAULT-BRIDGE 
      Another app is currently holding the xtables lock; waiting for it to exit...
      Another app is currently holding the xtables lock; waiting for it to exit...
      Another app is currently holding the xtables lock; waiting for it to exit...
      Another app is currently holding the xtables lock; waiting for it to exit...
      Another app is currently holding the xtables lock; waiting for it to exit...
      Another app is currently holding the xtables lock; waiting for it to exit...
      Another app is currently holding the xtables lock; waiting for it to exit...
      Another app is currently holding the xtables lock; waiting for it to exit...
      Another app is currently holding the xtables lock; waiting for it to exit...
      Another app is currently holding the xtables lock; waiting for it to exit...
      Another app is currently holding the xtables lock; waiting for it to exit...
      Another app is currently holding the xtables lock; waiting for it to exit...
      Another app is currently holding the xtables lock; waiting for it to exit...
      Another app is currently holding the xtables lock; waiting for it to exit...
      Another app is currently holding the xtables lock; waiting for it to exit...
      Another app is currently holding the xtables lock; waiting for it to exit...
      Another app is currently holding the xtables lock; waiting for it to exit...
      Another app is currently holding the xtables lock; waiting for it to exit...
      Another app is currently holding the xtables lock; waiting for it to exit...
      Another app is currently holding the xtables lock; waiting for it to exit...
      Another app is currently holding the xtables lock; waiting for it to exit...
      ^C
      
      real    0m41.349s
      user    0m0.001s
      sys    0m0.000s
      

      And you'll notice that a lot of iptables and mesos port mapper CNI plugin processes on the box

      $ ps -fp $(pidof mesos-cni-port-mapper) | wc -l
      191
      $ ps -fp $(pidof iptables) | wc -l
      192
      

      Then, we look into the process that is holding the xtables lock:

      $ sudo netstat -p -n | grep xtables
      unix  2      [ ]         STREAM                   225083   25048/iptables       @xtables
      $ ps aux | grep 25048
      root     25048  0.0  0.0  26184  2512 ?        S    22:41   0:00 iptables -w -t nat -S UCR-DEFAULT-BRIDGE
      core     31857  0.0  0.0   6760   976 pts/0    S+   23:18   0:00 grep --colour=auto 25048
      $ sudo strace -s 10000 -p 25048
      Process 25048 attached
      write(1, "-dport 13839 -m comment --comment \"container_id: a074efef-c119-4764-a584-87c57e6b7f68\" -j DNAT --to-destination 172.31.254.183:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport 17282 -m comment --comment \"container_id: 47880db5-90ad-4034-9c2b-2fd246d42342\" -j DNAT --to-destination 172.31.254.126:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport 28712 -m comment --comment \"container_id: 3293d5d0-772c-48d2-bafd-1c4b4d56247e\" -j DNAT --to-destination 172.31.254.130:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport 23893 -m comment --comment \"container_id: f57de8eb-a3b9-44cb-8dac-7ab261bc8aac\" -j DNAT --to-destination 172.31.254.149:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport 28449 -m comment --comment \"container_id: 9238dbf0-7b28-4fda-880a-bf0c8f40562a\" -j DNAT --to-destination 172.31.254.190:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport 26438 -m comment --comment \"container_id: d307cf58-8972-4de4-ad45-26c29786add0\" -j DNAT --to-destination 172.31.254.187:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport 7682 -m comment --comment \"container_id: 60f5a61b-f4c0-4846-b2cd-63cd7eb5a4e8\" -j DNAT --to-destination 172.31.254.177:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport 23904 -m comment --comment \"container_id: f203ff9e-7b81-4e54-ab44-d45e2a937f38\" -j DNAT --to-destination 172.31.254.157:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport 8359 -m comment --comment \"container_id: 578cc89c-83bf-46ba-9ae7-c7b89e40e739\" -j DNAT --to-destination 172.31.254.158:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport 28482 -m comment --comment \"container_id: 70721adb-cd6c-4158-8b11-4ef694999203\" -j DNAT --to-destination 172.31.254.163:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport 2564 -m comment --comment \"container_id: 70bc9735-8e0a-4b27-9a68-1b1151c9f6ec\" -j DNAT --to-destination 172.31.254.105:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport 6952 -m comment --comment \"container_id: a494e0fe-ce6e-4e95-b094-504c7603b6fb\" -j DNAT --to-destination 172.31.254.159:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport 29371 -m comment --comment \"container_id: 1dac0183-8939-4165-95e8-cf6503cc53ad\" -j DNAT --to-destination 172.31.254.174:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport 6551 -m comment --comment \"container_id: 148a197b-7247-4a3c-b528-1417cc8ccf3f\" -j DNAT --to-destination 172.31.254.144:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport 16568 -m comment --comment \"container_id: e68bb942-ab69-43a5-a308-40b33368d111\" -j DNAT --to-destination 172.31.254.184:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport 5915 -m comment --comment \"container_id: 08d53ef6-7d3c-4b44-8bb7-923b6a4a1780\" -j DNAT --to-destination 172.31.254.186:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport 11252 -m comment --comment \"container_id: 894e1455-6c77-4fc2-8414-53f3db7b492a\" -j DNAT --to-destination 172.31.254.223:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport 7583 -m comment --comment \"container_id: 917b1fc3-12ca-4198-a5f4-13144fbf9096\" -j DNAT --to-destination 172.31.254.164:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport 12695 -m comment --comment \"container_id: cfadeb29-3481-4e9e-8eba-e57ce6a4f693\" -j DNAT --to-destination 172.31.254.153:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport 1256 -m comment --comment \"container_id: 212649ee-a894-4201-bc38-798545ce9e1f\" -j DNAT --to-destination 172.31.254.132:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport 28581 -m comment --comment \"container_id: 38c46524-9023-450d-a134-24ad14eecbd5\" -j DNAT --to-destination 172.31.254.202:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport 5709 -m comment --comment \"container_id: 8a510a8e-df16-40b2-a4b7-36605a5f9984\" -j DNAT --to-destination 172.31.254.194:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport 6050 -m comment --comment \"container_id: 94757d96-b6ad-4a57-9e38-c956a2e24fd6\" -j DNAT --to-destination 172.31.254.152:80\n-A UCR-DEFAULT-B", 4096
      

      So it gets stuck when writing to stdout. That indicates that the pipe is full, and the reader end hasn't read from the pipe. Then we identified the reader

      $ ps aux | grep a074efef-c119-4764-a584-87c57e6b7f68
      core      2355  0.0  0.0   6756  1020 pts/0    S+   23:39   0:00 grep --colour=auto a074efef-c119-4764-a584-87c57e6b7f68
      root     26486  0.0  0.0  11948  2852 ?        S    22:41   0:00 sh -c        #!/bin/sh       exec 1>&2       set -x        # The iptables command searches for the DNAT rules with tag       # "container_id: <CNI_CONTAINERID>", and if it exists goes ahead       # and deletes it.       iptables -w -t nat -S UCR-DEFAULT-BRIDGE | sed "/container_id: a074efef-c119-4764-a584-87c57e6b7f68/ s/-A/iptables -w -t nat -D/e"
      root     26490  0.0  0.0   4340   896 ?        S    22:41   0:00 sed /container_id: a074efef-c119-4764-a584-87c57e6b7f68/ s/-A/iptables -w -t nat -D/e
      $ sudo strace -s 10000 -p 26490
      Process 26490 attached
      read(0, ^CProcess 26490 detached
      

      The reader which is `sed` is reading from stdin, somehow gets stuck as well.

      The corresponding container gets stuck during CNI detach after checking the agent log.

      After checking the code, we pinpoint the problematic script in our CNI port mapper plugin:
      https://github.com/apache/mesos/blob/1.6.0/src/slave/containerizer/mesos/isolators/network/cni/plugins/port_mapper/port_mapper.cpp#L372

      iptables -w -t nat -S %s | sed "/%s/ s/-A/iptables -w -t nat -D/e"
      

      This command will deadlock if there are a lot of entires in the iptables because the `sed` won't process the next line while executing `iptables -w -t nat -D ...`. But the executing of `iptables -w -t nat -D ... ` might get stuck if the first command `iptables -w -t nat -S %s` didn't finish, because the lock is not released. The first command might not finish if it has a lot of output, filling the pipe that `sed` hasn't had a chance to process yet.

      The correct solution is to write the command to execute in a file, and execute the file.

      Attachments

        Activity

          People

            jieyu Jie Yu
            jieyu Jie Yu
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: