Uploaded image for project: 'Ignite'
  1. Ignite
  2. IGNITE-6783

Create common mechanism for group training.

    XMLWordPrintableJSON

Details

    • Task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.4
    • ml
    • None

    Description

      In distributed ML it is a common task to train several models in parallel with ability to communicate with each other during training. Simple example of this case is training of neural network with SGD on different chunks of data located on several nodes. In such training we do the following in a loop: on each node we do one or several SGD steps then send gradient on central node which averages gradients from each of worker nodes and send back the averaged gradient. There is a pattern in this procedure which can be applied to other ML algos and it could be useful to extract this pattern.

      Attachments

        Activity

          People

            amalykh Artem Malykh
            amalykh Artem Malykh
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: