Details
Description
Recently, we noticed that if one launches a lot of containers that use the port mapper CNI plugin, the iptables will get deadlock. The symptom is like the following.
If you do any iptables command on the box, it'll get stuck on acquiring the xtables lock:
core@ip-10-0-2-99 ~ $ time iptables -w -t nat -S UCR-DEFAULT-BRIDGE Another app is currently holding the xtables lock; waiting for it to exit... Another app is currently holding the xtables lock; waiting for it to exit... Another app is currently holding the xtables lock; waiting for it to exit... Another app is currently holding the xtables lock; waiting for it to exit... Another app is currently holding the xtables lock; waiting for it to exit... Another app is currently holding the xtables lock; waiting for it to exit... Another app is currently holding the xtables lock; waiting for it to exit... Another app is currently holding the xtables lock; waiting for it to exit... Another app is currently holding the xtables lock; waiting for it to exit... Another app is currently holding the xtables lock; waiting for it to exit... Another app is currently holding the xtables lock; waiting for it to exit... Another app is currently holding the xtables lock; waiting for it to exit... Another app is currently holding the xtables lock; waiting for it to exit... Another app is currently holding the xtables lock; waiting for it to exit... Another app is currently holding the xtables lock; waiting for it to exit... Another app is currently holding the xtables lock; waiting for it to exit... Another app is currently holding the xtables lock; waiting for it to exit... Another app is currently holding the xtables lock; waiting for it to exit... Another app is currently holding the xtables lock; waiting for it to exit... Another app is currently holding the xtables lock; waiting for it to exit... Another app is currently holding the xtables lock; waiting for it to exit... ^C real 0m41.349s user 0m0.001s sys 0m0.000s
And you'll notice that a lot of iptables and mesos port mapper CNI plugin processes on the box
$ ps -fp $(pidof mesos-cni-port-mapper) | wc -l 191 $ ps -fp $(pidof iptables) | wc -l 192
Then, we look into the process that is holding the xtables lock:
$ sudo netstat -p -n | grep xtables unix 2 [ ] STREAM 225083 25048/iptables @xtables $ ps aux | grep 25048 root 25048 0.0 0.0 26184 2512 ? S 22:41 0:00 iptables -w -t nat -S UCR-DEFAULT-BRIDGE core 31857 0.0 0.0 6760 976 pts/0 S+ 23:18 0:00 grep --colour=auto 25048 $ sudo strace -s 10000 -p 25048 Process 25048 attached write(1, "-dport 13839 -m comment --comment \"container_id: a074efef-c119-4764-a584-87c57e6b7f68\" -j DNAT --to-destination 172.31.254.183:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport 17282 -m comment --comment \"container_id: 47880db5-90ad-4034-9c2b-2fd246d42342\" -j DNAT --to-destination 172.31.254.126:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport 28712 -m comment --comment \"container_id: 3293d5d0-772c-48d2-bafd-1c4b4d56247e\" -j DNAT --to-destination 172.31.254.130:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport 23893 -m comment --comment \"container_id: f57de8eb-a3b9-44cb-8dac-7ab261bc8aac\" -j DNAT --to-destination 172.31.254.149:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport 28449 -m comment --comment \"container_id: 9238dbf0-7b28-4fda-880a-bf0c8f40562a\" -j DNAT --to-destination 172.31.254.190:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport 26438 -m comment --comment \"container_id: d307cf58-8972-4de4-ad45-26c29786add0\" -j DNAT --to-destination 172.31.254.187:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport 7682 -m comment --comment \"container_id: 60f5a61b-f4c0-4846-b2cd-63cd7eb5a4e8\" -j DNAT --to-destination 172.31.254.177:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport 23904 -m comment --comment \"container_id: f203ff9e-7b81-4e54-ab44-d45e2a937f38\" -j DNAT --to-destination 172.31.254.157:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport 8359 -m comment --comment \"container_id: 578cc89c-83bf-46ba-9ae7-c7b89e40e739\" -j DNAT --to-destination 172.31.254.158:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport 28482 -m comment --comment \"container_id: 70721adb-cd6c-4158-8b11-4ef694999203\" -j DNAT --to-destination 172.31.254.163:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport 2564 -m comment --comment \"container_id: 70bc9735-8e0a-4b27-9a68-1b1151c9f6ec\" -j DNAT --to-destination 172.31.254.105:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport 6952 -m comment --comment \"container_id: a494e0fe-ce6e-4e95-b094-504c7603b6fb\" -j DNAT --to-destination 172.31.254.159:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport 29371 -m comment --comment \"container_id: 1dac0183-8939-4165-95e8-cf6503cc53ad\" -j DNAT --to-destination 172.31.254.174:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport 6551 -m comment --comment \"container_id: 148a197b-7247-4a3c-b528-1417cc8ccf3f\" -j DNAT --to-destination 172.31.254.144:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport 16568 -m comment --comment \"container_id: e68bb942-ab69-43a5-a308-40b33368d111\" -j DNAT --to-destination 172.31.254.184:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport 5915 -m comment --comment \"container_id: 08d53ef6-7d3c-4b44-8bb7-923b6a4a1780\" -j DNAT --to-destination 172.31.254.186:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport 11252 -m comment --comment \"container_id: 894e1455-6c77-4fc2-8414-53f3db7b492a\" -j DNAT --to-destination 172.31.254.223:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport 7583 -m comment --comment \"container_id: 917b1fc3-12ca-4198-a5f4-13144fbf9096\" -j DNAT --to-destination 172.31.254.164:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport 12695 -m comment --comment \"container_id: cfadeb29-3481-4e9e-8eba-e57ce6a4f693\" -j DNAT --to-destination 172.31.254.153:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport 1256 -m comment --comment \"container_id: 212649ee-a894-4201-bc38-798545ce9e1f\" -j DNAT --to-destination 172.31.254.132:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport 28581 -m comment --comment \"container_id: 38c46524-9023-450d-a134-24ad14eecbd5\" -j DNAT --to-destination 172.31.254.202:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport 5709 -m comment --comment \"container_id: 8a510a8e-df16-40b2-a4b7-36605a5f9984\" -j DNAT --to-destination 172.31.254.194:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport 6050 -m comment --comment \"container_id: 94757d96-b6ad-4a57-9e38-c956a2e24fd6\" -j DNAT --to-destination 172.31.254.152:80\n-A UCR-DEFAULT-B", 4096
So it gets stuck when writing to stdout. That indicates that the pipe is full, and the reader end hasn't read from the pipe. Then we identified the reader
$ ps aux | grep a074efef-c119-4764-a584-87c57e6b7f68 core 2355 0.0 0.0 6756 1020 pts/0 S+ 23:39 0:00 grep --colour=auto a074efef-c119-4764-a584-87c57e6b7f68 root 26486 0.0 0.0 11948 2852 ? S 22:41 0:00 sh -c #!/bin/sh exec 1>&2 set -x # The iptables command searches for the DNAT rules with tag # "container_id: <CNI_CONTAINERID>", and if it exists goes ahead # and deletes it. iptables -w -t nat -S UCR-DEFAULT-BRIDGE | sed "/container_id: a074efef-c119-4764-a584-87c57e6b7f68/ s/-A/iptables -w -t nat -D/e" root 26490 0.0 0.0 4340 896 ? S 22:41 0:00 sed /container_id: a074efef-c119-4764-a584-87c57e6b7f68/ s/-A/iptables -w -t nat -D/e $ sudo strace -s 10000 -p 26490 Process 26490 attached read(0, ^CProcess 26490 detached
The reader which is `sed` is reading from stdin, somehow gets stuck as well.
The corresponding container gets stuck during CNI detach after checking the agent log.
After checking the code, we pinpoint the problematic script in our CNI port mapper plugin:
https://github.com/apache/mesos/blob/1.6.0/src/slave/containerizer/mesos/isolators/network/cni/plugins/port_mapper/port_mapper.cpp#L372
iptables -w -t nat -S %s | sed "/%s/ s/-A/iptables -w -t nat -D/e"
This command will deadlock if there are a lot of entires in the iptables because the `sed` won't process the next line while executing `iptables -w -t nat -D ...`. But the executing of `iptables -w -t nat -D ... ` might get stuck if the first command `iptables -w -t nat -S %s` didn't finish, because the lock is not released. The first command might not finish if it has a lot of output, filling the pipe that `sed` hasn't had a chance to process yet.
The correct solution is to write the command to execute in a file, and execute the file.