Details
-
Bug
-
Status: Closed
-
Critical
-
Resolution: Fixed
-
2.5.0, 3.0.0-alpha1
-
None
-
Reviewed
Description
Input streams lost their timeout. The problem appears to be DFSClient#newConnectedPeer does not set the read timeout. During a temporary network interruption the server will close the socket, unbeknownst to the client host, which blocks on a read forever.
The results are dire. Services such as the RM, JHS, NMs, oozie servers, etc all need to be restarted to recover - unless you want to wait many hours for the tcp stack keepalive to detect the broken socket.
Attachments
Attachments
Issue Links
- is broken by
-
HDFS-5810 Unify mmap cache and short-circuit file descriptor cache
- Closed
- is duplicated by
-
ACCUMULO-3396 HDFS reads are hanging
- Resolved
- is related to
-
HDFS-7608 hdfs dfsclient newConnectedPeer has no write timeout
- Resolved
- relates to
-
HBASE-15900 RS stuck in get lock of HStore
- Resolved