Uploaded image for project: 'Singa'
  1. Singa
  2. SINGA-32

Implement AllReduce training framework

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • None

    Description

      The AllReduce training framework runs in synchronous mode, where one worker starts the next iteration after all workers have finished the previous iteration. Baidu's deepimage system uses this training framework.

      To implement it in SINGA, we launch one worker group and one server group. The model is partitioned (e.g., on dimension 0) among all workers. Params are sliced and partitioned among all servers.

      At the beginning, each Param (slice) is put into server shard including number of workers computing gradient for it.

      For each iteration, the local stub aggregates all gradients for the same Param and sends to corresponding server including the number of local workers computing gradient for it. The server will buffer update requests and conducts update for a Param slice until it receives gradients from all workers. It sends back the updated Param (slices) to the corresponding process (stub).

      Attachments

        Activity

          People

            wangwei.cs wangwei
            wangwei.cs wangwei
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: