Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
Description
The Updater's Update function has an argument grad_scale (default value is 1.0), which is used to scale the gradients of parameters. For instance, when n workers compute over one mini-batch (each worker is assigned 1/n records), then their gradients should be averaged. We can do the average by passing grad_scale=1/n.
Some updaters in updater.cc, e.g., AdaGradUpdater, forget to scale the gradients by grad_scale. The bug can be fixed by
if (grad_scale != 1)
grad *= grad_scale;