While running a test we found that WebHdfsFileSystem can create several thousand connections when doing a position read of a 200MB file. For each connection the client will connect to the DataNode again and the DataNode will create a new DFSClient instance to handle the read request. This also leads to several thousand getBlockLocations call to the NameNode.
The cause of the issue is that in FSInputStream#read(long, byte, int, int), each time the inputstream reads some time, it seeks back to the old position and resets its state to SEEK. Thus the next read will regenerate the connection.