[MESOS-5723] SSL-enabled libprocess will leak incoming links to forks - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Blocker
Resolution: Fixed
Affects Version/s: 0.24.0, 0.25.0, 0.26.0, 0.27.0, 0.28.0
Fix Version/s: 0.28.3, 1.0.0
Component/s: libprocess
Labels:

Sprint:
Mesosphere Sprint 38
Story Points:
2

Description

Encountered two different buggy behaviors that can be tracked down to the same underlying problem.

Repro #1 (non-crashy):
(1) Start a master. Doesn't matter if SSL is enabled or not.
(2) Start an agent, with SSL enabled. Downgrade support has the same problem. The master/agent link to one another.
(3) Run a sleep task. Keep this alive. If you inspect FDs at this point, you'll notice the task has inherited the link FD (master -> agent).
(4) Restart the agent. Due to (3), the master's link stays open.
(5) Check master's logs for the agent's re-registration message.
(6) Check the agent's logs for re-registration. The message will not appear. The master is actually using the old link which is not connected to the agent.

Repro #2 (crashy):
(1) Start a master. Doesn't matter if SSL is enabled or not.
(2) Start an agent, with SSL enabled. Downgrade support has the same problem.
(3) Run ~100 sleep task one after the other, keep them all alive. Each task links back to the agent. Due to an FD leak, each task will inherit the incoming links from all other actors...
(4) At some point, the agent will run out of FDs and kernel panic.

It appears that the SSL socket accept call is missing os::nonblock and os::cloexec calls:
https://github.com/apache/mesos/blob/4b91d936f50885b6a66277e26ea3c32fe942cf1a/3rdparty/libprocess/src/libevent_ssl_socket.cpp#L794-L806

For reference, here's poll socket's accept:
https://github.com/apache/mesos/blob/4b91d936f50885b6a66277e26ea3c32fe942cf1a/3rdparty/libprocess/src/poll_socket.cpp#L53-L75

Attachments

Activity

People

Assignee:: Joseph Wu

Reporter:: Joseph Wu

Shepherd:: Joris Van Remoortere

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 27/Jun/16 18:40

Updated:: 22/Mar/19 16:40

Resolved:: 28/Jun/16 20:05