Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-2162

Implement adaptive learning rate strategies for SGD

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Won't Do
    • None
    • None

    Description

      At the moment, the SGD implementation uses a simple adaptive learning rate strategy, adaptedLearningRate = initialLearningRate/sqrt(iterationNumber), which makes the optimization algorithm sensitive to the setting of the initialLearningRate. If this value is chosen wrongly, then the SGD might become instable.

      There are better ways to calculate the learning rate [1] such as Adagrad [3], Adadelta [4], SGD with momentum [5] others [2]. They promise to result in more stable optimization algorithms which don't require so much hyperparameter tweaking. It might be worthwhile to investigate these approaches.

      It might also be interesting to look at the implementation of vowpal wabbit [6].

      Resources:
      [1] http://imgur.com/a/Hqolp
      [2] http://cs.stanford.edu/people/karpathy/convnetjs/demo/trainers.html
      [3] http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf
      [4] http://www.matthewzeiler.com/pubs/googleTR2012/googleTR2012.pdf
      [5] http://www.willamette.edu/~gorr/classes/cs449/momrate.html
      [6] https://github.com/JohnLangford/vowpal_wabbit

      Attachments

        Issue Links

          Activity

            People

              ventura Ventura Del Monte
              trohrmann Till Rohrmann
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: