Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
2.0.0
-
None
Description
We conflate "being activated" with "all workers are ready" in WorkerState, by making isWorkerActivated a part of isTopologyActivated.
The issue with this is that isTopologyActivated is used to communicate activation/deactivation to the executors, and is updated on a timer (default only every 10 seconds). isWorkerActivated is really meant to be a one-way switch, which lets us delay executor initialization until all other workers in the topology are also started.
Since we mix the two up, if a worker is started in the topology and all other connections aren't ready immediately (e.g. as happens every time you deploy a topology, some workers will boot faster than others), the worker may have to wait up to 10 seconds to start.
We should make sure the wait for isWorkerActivated happens via CountDownLatch instead, so the executor will start as soon as the connections are ready.
Attachments
Issue Links
- links to