[KAFKA-16768] SocketServer leaks accepted SocketChannel instances due to race condition - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 3.8.0
Fix Version/s: None
Component/s: core
Labels:
- newbie

Description

The SocketServer has threads for Acceptors and Processors. These threads communicate via Processor#accept/Processor#configureNewConnections and the `newConnections` queue.

During shutdown, the Acceptor and Processors are each stopped by setting shouldRun to false, and then shutdown proceeds asynchronously in all instances together. This leads to a race condition where an Acceptor accepts a SocketChannel and queues it to a Processor, but that Processor instance has already started shutting down and has already drained the newConnections queue.

~~KAFKA-16765~~ is an analogous bug in NioEchoServer, which uses a completely different implementation but has the same flaw.

An example execution order that includes this leak:
1. Acceptor#accept() is called, and a new SocketChannel is accepted.
2. Acceptor#assignNewConnection() begins
3. Acceptor#close() is called, which sets shouldRun to false in the Acceptor and attached Processor instances
4. Processor#run() checks the shouldRun variable, and exits the loop
5. Processor#closeAll() executes, and drains the `newConnections` variable
6. Processor#run() returns and the Processor thread terminates
7. Acceptor#assignNewConnection() calls Processor#accept(), which adds the SocketChannel to `newConnections`
8. Acceptor#assignNewConnection() returns
9. Acceptor#run() checks the shouldRun variable and exits the loop, and the Acceptor thread terminates.
10. Acceptor#close() joins all of the terminated threads, and returns

At the end of this sequence, there are still open SocketChannel instances in newConnections, which are then considered leaked.

Attachments

Issue Links

Discovered while testing

KAFKA-15845 Add Junit5 test extension which detects leaked Kafka clients and servers

In Progress

is related to

KAFKA-16765 NioEchoServer leaks accepted SocketChannel instances due to race condition

Resolved

Activity

People

Assignee:: Ela Bhattacharya

Reporter:: Greg Harris

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 15/May/24 01:50

Updated:: 15/Jul/24 15:10