Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-1859

Linear, Ridge and Lasso Regressions with SGD yield unexpected results

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Not A Problem
    • Affects Version/s: 0.9.1
    • Fix Version/s: None
    • Component/s: MLlib
    • Environment:

      OS: Ubuntu Server 12.04 x64
      PySpark

      Description

      Issue:
      Linear Regression with SGD don't work as expected on any data, but lpsa.dat (example one).
      Ridge Regression with SGD sometimes works ok.
      Lasso Regression with SGD sometimes works ok.

      Code example (PySpark) based on http://spark.apache.org/docs/0.9.0/mllib-guide.html#linear-regression-2 :

      regression_example.py
      parsedData = sc.parallelize([
          array([2400., 1500.]),
          array([240., 150.]),
          array([24., 15.]),
          array([2.4, 1.5]),
          array([0.24, 0.15])
      ])
      
      # Build the model
      model = LinearRegressionWithSGD.train(parsedData)
      print model._coeffs
      

      So we have a line (f(X) = 1.6 * X) here. Fortunately, f(X) = X works!
      The resulting model has nan coeffs: array([ nan]).
      Furthermore, if you comment records line by line you will get:

      • [-1.55897475e+296] coeff (the first record is commented),
      • [-8.62115396e+104] coeff (the first two records are commented),
      • etc

      It looks like the implemented regression algorithms diverges somehow.

      I get almost the same results on Ridge and Lasso.

      I've also tested these inputs in scikit-learn and it works as expected there.

      However, I'm still not sure whether it's a bug or SGD 'feature'. Should I preprocess my datasets somehow?

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                frol Vlad Frolov
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: