Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Cannot Reproduce
-
None
-
None
-
None
-
None
Description
I test current code on a 5-node cluster using RS(3,2). When a datanode is corrupt, client succeeds to write a file smaller than a block group but fails to write a large one. TestDFSStripeOutputStreamWithFailure only tests files smaller than a block group, this jira will add more test situations.
A streamer may encounter some bad datanodes when writing blocks allocated to it. When it fails to connect datanode or send a packet, the streamer needs to prepare for the next block. First it removes the packets of current block from its data queue. If the first packet of next block has already been in the data queue, the streamer will reset its state and start to wait for the next block allocated for it; otherwise it will just wait for the first packet of next block. The streamer will check periodically if it is asked to terminate during its waiting.