I looked into this issue. I found a few things:
The HDFS socket cache is too small by default and times out too quickly. Its default size is 16, but HBase seems to be opening many more connections to the DN than that. In this situation, sockets must inevitably be opened and then discarded, leading to sockets in CLOSE_WAIT.
When you use positional read (aka pread), we grab a socket from the cache, read from it, and then immediately put it back. When you seek and then call read, we don't put the socket back at the end. The assumption behind the normal read method is that you are probably going to call read again, so it holds on to the socket until something else comes up (such as closing the stream). In many scenarios, this can lead to seek+read generating more sockets in CLOSE_WAIT than pread.
I don't think we want to alter this HDFS behavior, since it's helpful in the case that you're reading through the entire file from start to finish-- which many HDFS clients do. It allows us to make certain optimizations such as reading a few kilobytes at a time, even if the user only asks for a few bytes at a time. These optimizations are unavailable with pread because it creates a new BlockReader each time.
So as far as recommendations for HBase go:
- use short-circuit reads whenever possible, since in many cases you can avoid needing a socket at all and just reuse the same file descriptor
- set the socket cache to a bigger size and adjust the timeouts to be longer (I may explore changing the defaults in HDFS...)
- if you are going to keep files open for a while and random read, use pread, never seek+read.