Details
Description
There is a strange phenomenon here, DataNode holds a large number of connections in CLOSE_WAIT state and does not release.
netstat -na | awk '/^tcp/ {++S[$NF]} END
'
LISTEN 20
CLOSE_WAIT 17707
ESTABLISHED 1450
TIME_WAIT 12
It can be found that the connections with the CLOSE_WAIT state have reached 17k and are still growing. View these CLOSE_WAITs through the lsof command, and get the following phenomenon:
lsof -i tcp | grep -E 'CLOSE_WAIT|COMMAND'
It can be seen that the reason for this phenomenon is that Socket#close() is not called correctly, and DataNode interacts with other nodes as Client.
Attachments
Attachments
Issue Links
- is related to
-
HDFS-15709 EC: Socket file descriptor leak in StripedBlockChecksumReconstructor
-
- Resolved
-