Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Won't Fix
-
2.2.0
-
None
-
None
Description
Rebuilding write pipelines is expensive and this can happen many times during a rolling restart of datanodes (i.e. during a rolling upgrade). It seems like it might help if datanodes could be told to drain current work while rejecting new requests - possibly with a new response indicating the node is temporarily unavailable (it's not broken, it's just going through a maintenance phase where it shouldn't accept new work).
Waiting just a few seconds is normally enough to clear up a good percentage of the open requests without error, thus reducing the overhead associated with restarting lots of datanodes in rapid succession.
Obviously would need a timeout to make sure the datanode doesn't wait forever.