Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
2.7.0
-
None
Description
The DataNode can sometimes allocate a ShortCircuitShm slot and fail to tell the DFSClient about it because of a network error. In DataXceiver#requestShortCircuitFds, the DataNode can succeed at the first part (mark the slot as used) and fail at the second part (tell the DFSClient what it did). The "try" block for unregistering the slot only covers a failure in the first part, not the second part. In this way, a divergence can form between the views of which slots are allocated on DFSClient and on server.
Attachments
Attachments
Issue Links
- breaks
-
HDFS-8070 Pre-HDFS-7915 DFSClient cannot use short circuit on post-HDFS-7915 DataNode
- Closed
- is broken by
-
HDFS-9466 TestShortCircuitCache#testDataXceiverCleansUpSlotsOnFailure is flaky
- Resolved
- is related to
-
HADOOP-11802 DomainSocketWatcher thread terminates sometimes after there is an I/O error during requestShortCircuitShm
- Closed