-
Type:
Bug
-
Status: Closed
-
Priority:
Critical
-
Resolution: Fixed
-
Affects Version/s: 2.2.0
-
Fix Version/s: 2.3.0
-
Component/s: hdfs-client
-
Labels:None
-
Hadoop Flags:Reviewed
lsof -i TCP:1004 | grep -c CLOSE_WAIT
18235
When client request a file's block to DataNode:1004. If request fail because "java.io.IOException: Got error for OP_READ_BLOCK,Block token is expired." Occurs and the TCP socket that regionserver using is not closed.
I think the problem above is in DatanodeInfo blockSeekTo(long target) of Class DFSInputStream
The connection client using is BlockReader:
blockReader = getBlockReader(targetAddr, chosenNode, src, blk,
accessToken, offsetIntoBlock, blk.getNumBytes() - offsetIntoBlock,
buffersize, verifyChecksum, dfsClient.clientName);
In DFSInputStream.blockSeekTo()-line 533,invoke getBlockReader() which wil generate a peer use newTcpPeer(dnAddr) -line 1107,when BlockReaderFactory.newBlockReader throw IOException,the peer will not be closed which will cause a CLOSE_WAIT connection.
In our test,when datanode get a InvalidToken exception in DataXceiver.checkAccess(),it will close the connection.At regionserver side, in RemoteBlockReader2.newBlockReader(),checkSuccess() will throw a InvalidBlockTokenException, DFSInputStream.blockSeekTo() will catch the exception, but the connection is NOT closed, it become CLOSE_WAIT.