Details
-
Sub-task
-
Status: Open
-
Major
-
Resolution: Unresolved
-
0.9.0
-
None
-
None
Description
SAMZA-516 contains some discussion about supporting both a delay in partition assignment within samza-standalone, and also a delay in an orphaned container's shutdown. These two configs should be added to samza-standalone to allow operators to tune their job to either:
- Continue running even in the face of ZK failure.
- Delay partition shifting to prevent duplicate messages.
These two knobs are equivalent to YARN's RM/NM timeouts: how long an NM should stay running when it can't talk to the RM, and how long an RM should allow a container to be dead before it notifies the AM for partition reassignment.
In samza-standalone we should add the concept of tunable "pausing", which will pause a SamzaContainer after N milliseconds. We should also add a tunable in the leader, to allow it to delay partition reassignment when a container is lost.