This is the reason that TestFileConcurrentReaders has been failing a lot. Reproducing a comment from
The test has a thread which continually re-opens the file which is being written to. Since the file's in the middle of being written, it makes an RPC to the DataNode in order to determine the visible length of the file. This RPC is authenticated using the block token which came back in the LocatedBlocks object as the security ticket.
When this RPC hits the IPC layer, it looks at its existing connections and sees none that can be re-used, since the block token differs between the two requesters. Hence, it reconnects, and we end up with hundreds or thousands of IPC connections to the datanode.
- is broken by
HADOOP-7317 RPC.stopProxy doesn't actually close proxy
- is related to
HDFS-12737 Thousands of sockets lingering in TIME_WAIT state due to frequent file open operations