A non-striped DataStreamer goes through the following steps in error handling:
With multiple streamer threads run in parallel, we need to correctly handle a large number of possible combinations of interleaved thread events. For example, streamer_B starts step 2 in between events streamer_A.2 and streamer_A.3.
- We can preallocate GS when NN creates a new striped block group (FSN#createNewBlock). For each new striped block group we can reserve NUM_PARITY_BLOCKS GS's. If more than NUM_PARITY_BLOCKS errors have happened we shouldn't try to further recover anyway.
- We can use a dedicated event processor to offload the error handling logic from DFSStripedOutputStream, which is not a long running daemon.
- We can limit the lifespan of a streamer to be a single block. A streamer ends either after finishing the current block or when encountering a DN failure.
With the proposed change, a StripedDataStreamer's flow becomes: