Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
Description
Ideally, when a singa process crashes, its registered (ephemeral) file in zookeeper will be automatically deleted.
However, this deletion will be executed after a TIME_OUT time. Hence, if we rerun singa immediately, the server will get a phantom worker. If this is the only registered worker when the file is finally deleted, the server will think all workers have been left. It may terminate its service, before worker starts to execute.