Handling feature scaling properly for GLMs

      GeneralizedLinearAlgorithm can scale features. This has 2 effects:

      • improves optimization behavior (essentially always improves behavior in practice)
      • changes the optimal solution (often for the better in terms of standardizing feature importance)

      Current problems:

      • Inefficient implementation: We make a rescaled copy of the data.
      • Surprising API: For algorithms which use feature scaling, users may get different solutions than with R or other libraries. (Note: Feature scaling could be handled without changing the solution.)
      • Inconsistent API: Not all algorithms have the same default for feature scaling, and not all expose the option.

      This is a proposal discussed with Xiangrui Meng for an "ideal" solution. This solution will require some breaking API changes, but I'd argue these are necessary for the long-term since it's the best API we have thought of.


      • Implementation: Change to avoid making a rescaled copy of the data (described below). No API issues here.
      • API:
        • Hide featureScaling from API. (breaking change)
        • Internally, handle feature scaling to improve optimization, but modify it so it does not change the optimal solution. (breaking change, in terms of algorithm behavior)
        • Externally, users who want to rescale feature (to change the solution) should do that scaling as a preprocessing step.

      Details on implementation:

      • GradientDescent could instead scale the step size separately for each feature (and adjust regularization as needed; see the PR linked above). This would require storing a vector of length numFeatures, rather than making a full copy of the data.
      • I haven't thought this through for LBFGS, but I hope DB Tsai can weigh in here.


            dbtsai DB Tsai
            josephkb Joseph K. Bradley
