Details
-
Bug
-
Status: Closed
-
Critical
-
Resolution: Fixed
-
2.2.0
-
None
-
Reviewed
Description
lsof -i TCP:1004 | grep -c CLOSE_WAIT
18235
When client request a file's block to DataNode:1004. If request fail because "java.io.IOException: Got error for OP_READ_BLOCK,Block token is expired." Occurs and the TCP socket that regionserver using is not closed.
I think the problem above is in DatanodeInfo blockSeekTo(long target) of Class DFSInputStream
The connection client using is BlockReader:
blockReader = getBlockReader(targetAddr, chosenNode, src, blk,
accessToken, offsetIntoBlock, blk.getNumBytes() - offsetIntoBlock,
buffersize, verifyChecksum, dfsClient.clientName);
In DFSInputStream.blockSeekTo()-line 533,invoke getBlockReader() which wil generate a peer use newTcpPeer(dnAddr) -line 1107,when BlockReaderFactory.newBlockReader throw IOException,the peer will not be closed which will cause a CLOSE_WAIT connection.
In our test,when datanode get a InvalidToken exception in DataXceiver.checkAccess(),it will close the connection.At regionserver side, in RemoteBlockReader2.newBlockReader(),checkSuccess() will throw a InvalidBlockTokenException, DFSInputStream.blockSeekTo() will catch the exception, but the connection is NOT closed, it become CLOSE_WAIT.
Attachments
Attachments
Issue Links
- is duplicated by
-
HDFS-5697 connection leak in DFSInputStream
- Resolved