Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-28295

Is there a way of getting feature names from pyspark.ml.regression GeneralizedLinearRegression?

    XMLWordPrintableJSON

Details

    • Request
    • Status: Resolved
    • Minor
    • Resolution: Invalid
    • 2.3.1
    • 2.3.1
    • Build

    Description

      Using pyspark.ml.regression,

      when I fit a GeneralizedLinearRegression like this:
      glr = GeneralizedLinearRegression(family="gaussian", link="identity",
      regParam=0.3, maxIter=10)
      model = glr.fit(someData)

      It seems like there is no way to get the matching of the features and their coefficients or standard errors. I am using an ugly work around like this right now:

      field = model.summary._call_java('getClass').getDeclaredField("coefficientsWithStatistics")
      object2 = model._call_java('summary')
      field.setAccessible(True)
      value = field.get(object2)

      coef_value = {}

      for i in range(0, len(value)):
         row = value[i].toString()
         values = row.split(',')
         coef_value[values[0].replace('(', '').replace(')', '')] = float(values[1])

      Am I missing something?
      If not, I'd like to request a method similar to model.coefficients with which one can just get the feature names in the right order, like model.features or something like that.

      Attachments

        Activity

          People

            Unassigned Unassigned
            nskotara Nils Skotara
            Votes:
            2 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: