Uploaded image for project: 'Singa'
  1. Singa
  2. SINGA-3

Use Zookeeper to check stopping (finish) time of the system

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • None
    • Linux, gcc>4.8

    Description

      To stop each process (node), we need to stop both its local workers and servers. For worker threads, they will exit when they finish all training steps. For server threads, they can exit only when all connected workers have stopped.

      We use Zookeeper to detect the worker state. In specific, the main thread of each process registers all local servers firstly to the Zookeeper. Then it registers each worker to a dedicated server group, where its parameters are maintained. When one worker finishes execution, it de-register from the server group (folder) in the Zookeeper and tells the main thread about its state. When all workers registered in one server group finish, the callback function registered for server group will send a stop message to him. The server tells the main thread about its state and stops upon receiving this message. Once all local workers and local servers finish, the main thread exit.

      Attachments

        Activity

          People

            Unassigned Unassigned
            wangwei.cs wangwei
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: