Uploaded image for project: 'Apache MADlib'
  1. Apache MADlib
  2. MADLIB-1210

Add momentum methods to MLP

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • v1.15
    • None

    Description

      Story

      As a data scientist,
      I want to use momentum methods in MLP,
      so that I get significantly better convergence behavior.

      Details

      Adding momentum will get the MADlib MLP algorithm closer to state of the art.

      1) Implement momentum term, default value ~0.9

      Ref [1]:
      "Momentum update is another approach that almost always enjoys better converge rates on deep networks."

      2) Implement Nesterov momentum, default TRUE

      Ref [1]:
      "Nesterov Momentum is a slightly different version of the momentum update that has recently been gaining popularity. It enjoys stronger theoretical converge guarantees for convex functions and in practice it also consistently works slightly better than standard momentum."

      Ref [2]
      "Nesterov’s accelerated gradient (abbrv. NAG; Nesterov, 1983) is a first-order optimization method which is proven to have a better convergence rate guarantee than gradient descent for general convex functions with Lipshitz-continuous derivatives (O(1/T2) versus O(1/T))"

      Interface

      There are 2 new optimization params for momentum, which apply for both
      classification and regression:

      'learning_rate_init = <value>,
      learning_rate_policy = <value>,
      gamma = <value>,
      power = <value>,
      iterations_per_step = <value>,
      n_iterations = <value>,
      n_tries = <value>,
      lambda = <value>,
      tolerance = <value>,
      batch_size = <value>,
      n_epochs = <value>,
      momentum = <value>,
      nesterov= <value>'
      
      momentum
      FLOAT8, default: 0.9. Momentum can help accelerate learning and 
      avoid local minima when using gradient descent. Value must be in the 
      range 0 to 1, where 0 means no momentum.
      
      nesterov
      BOOLEAN, default: TRUE. Nesterov momentum can provide better results than using
      classical momentum alone, due to its look ahead characteristics.  
      In classical momentum you first correct velocity and step with that 
      velocity, whereas in Nesterov momentum you first step in the velocity 
      direction then make a correction to the velocity vector based on 
      new location.
      
      Nesterov momentum is only used when the 'momentum' parameter is > 0.
      

      Open questions

      1) Does momentum and Nesterov momentum work equally well with and without mini-batching?
      Is there any guidance we need to give to users on this?

      Acceptance

      [1] Compare the usefulness of momentum with and without Nesterov, and SGD (i.e., 3 comparisons). Use a 2D Rosenbrock function to compare in a similar way to test ref [100] in the comment further down, i.e., loss by iteration number. Maybe try a few different 2D slices (starting points)

      [2] Test with MNIST. Please generate characteristic curves of loss vs. iteration number, similar to what was done for Rosenbrock.

      [3] Report out momentum value and Nesterov in the output summary table.

      References

      [1] http://cs231n.github.io/neural-networks-3/#sgd
      [2] http://www.cs.utoronto.ca/~ilya/pubs/ilya_sutskever_phd_thesis.pdf, a link from previous source.
      [3] http://ruder.io/optimizing-gradient-descent/index.html#gradientdescentoptimizationalgorithms
      [4] http://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html
      [5] https://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf

      Attachments

        1. MLP Momentum Default Value Experiments.pdf
          1.39 MB
          Jingyi Mei
        2. Momentum methods comparison.xlsx
          189 kB
          Frank McQuillan
        3. momentum with MNIST dataset.pdf
          186 kB
          Frank McQuillan

        Issue Links

          Activity

            People

              Unassigned Unassigned
              fmcquillan Frank McQuillan
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: