Details
-
Improvement
-
Status: Patch Available
-
Major
-
Resolution: Unresolved
-
HDFS-7285
-
None
Description
A non-striped DataStreamer goes through the following steps in error handling:
1) Finds error => 2) Asks NN for new GS => 3) Gets new GS from NN => 4) Applies new GS to DN (createBlockOutputStream) => 5) Ack from DN => 6) Updates block on NN
With multiple streamer threads run in parallel, we need to correctly handle a large number of possible combinations of interleaved thread events. For example, streamer_B starts step 2 in between events streamer_A.2 and streamer_A.3.
HDFS-9040 moves steps 1, 2, 3, 6 from streamer to DFSStripedOutputStream. This JIRA proposes some further optimizations based on HDFS-9040:
- We can preallocate GS when NN creates a new striped block group (FSN#createNewBlock). For each new striped block group we can reserve NUM_PARITY_BLOCKS GS's. If more than NUM_PARITY_BLOCKS errors have happened we shouldn't try to further recover anyway.
- We can use a dedicated event processor to offload the error handling logic from DFSStripedOutputStream, which is not a long running daemon.
- We can limit the lifespan of a streamer to be a single block. A streamer ends either after finishing the current block or when encountering a DN failure.
With the proposed change, a StripedDataStreamer's flow becomes:
1) Finds DN error => 2) Notify coordinator (async, not waiting for response) => terminates 1) Finds external error => 2) Applies new GS to DN (createBlockOutputStream) => 3) Ack from DN => 4) Notify coordinator (async, not waiting for response)
Attachments
Attachments
Issue Links
- depends upon
-
HDFS-9040 Erasure coding: Refactor DFSStripedOutputStream (Move Namenode RPC Requests to Coordinator)
- Resolved