Details
-
Bug
-
Status: Closed
-
Minor
-
Resolution: Fixed
-
None
-
None
Description
Momentum updates are complicated due to Nesterov requiring an initial update before gradient calculations. There is, however, a different form of the Nesterov update that can be cleanly performed after the regular update, simplifying the code. This allows performing the gradient calculations before any update - with or without Nesterov.