Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-21919

inconsistent behavior of AFTsurvivalRegression algorithm

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 2.2.0
    • None
    • ML, PySpark
    • None
    • Spark Version: 2.2.0
      Cluster setup: Standalone single node
      Python version: 3.5.2

    Description

      Took the direct example from spark ml documentation.

          training = spark.createDataFrame([
              (1.218, 1.0, Vectors.dense(1.560, -0.605)),
              (2.949, 0.0, Vectors.dense(0.346, 2.158)),
              (3.627, 0.0, Vectors.dense(1.380, 0.231)),
              (0.273, 1.0, Vectors.dense(0.520, 1.151)),
              (4.199, 0.0, Vectors.dense(0.795, -0.226))], ["label", "censor", 
              "features"])
          quantileProbabilities = [0.3, 0.6]
          aft = AFTSurvivalRegression(quantileProbabilities=quantileProbabilities,
                                      quantilesCol="quantiles")
          #aft = AFTSurvivalRegression()
          model = aft.fit(training)
          
          # Print the coefficients, intercept and scale parameter for AFT survival regression
          print("Coefficients: " + str(model.coefficients))
          print("Intercept: " + str(model.intercept))
          print("Scale: " + str(model.scale))
          model.transform(training).show(truncate=False)
      

      result is:

      Coefficients: [-0.496304411053,0.198452172529]
      Intercept: 2.6380898963056327
      Scale: 1.5472363533632303

      label censor features prediction quantiles
      1.218 1.0 [1.56,-0.605] 5.718985621018951 [1.160322990805951,4.99546058340675]
      2.949 0.0 [0.346,2.158] 18.07678210850554 [3.66759199449632,15.789837303662042]
      3.627 0.0 [1.38,0.231] 7.381908879359964 [1.4977129086101573,6.4480027195054905]
      0.273 1.0 [0.52,1.151] 13.577717814884505 [2.754778414791513,11.859962351993202]
      4.199 0.0 [0.795,-0.226] 9.013087597344805 [1.828662187733188,7.8728164067854856]

      But if we change the value of all labels as label + 20. as:

          training = spark.createDataFrame([
              (21.218, 1.0, Vectors.dense(1.560, -0.605)),
              (22.949, 0.0, Vectors.dense(0.346, 2.158)),
              (23.627, 0.0, Vectors.dense(1.380, 0.231)),
              (20.273, 1.0, Vectors.dense(0.520, 1.151)),
              (24.199, 0.0, Vectors.dense(0.795, -0.226))], ["label", "censor", 
              "features"])
          quantileProbabilities = [0.3, 0.6]
          aft = AFTSurvivalRegression(quantileProbabilities=quantileProbabilities,
                                       quantilesCol="quantiles")
          #aft = AFTSurvivalRegression()
          model = aft.fit(training)
          
          # Print the coefficients, intercept and scale parameter for AFT survival regression
          print("Coefficients: " + str(model.coefficients))
          print("Intercept: " + str(model.intercept))
          print("Scale: " + str(model.scale))
          model.transform(training).show(truncate=False)
      

      result changes to:

      Coefficients: [23.9932020748,3.18105314757]
      Intercept: 7.35052273751137
      Scale: 7698609960.724161

      label censor features prediction quantiles
      21.218 1.0 [1.56,-0.605] 4.0912442688237169E18 [0.0,0.0]
      22.949 0.0 [0.346,2.158] 6.011158613411288E9 [0.0,0.0]
      23.627 0.0 [1.38,0.231] 7.7835948690311181E17 [0.0,0.0]
      20.273 1.0 [0.52,1.151] 1.5880852723124176E10 [0.0,0.0]
      24.199 0.0 [0.795,-0.226] 1.4590190884193677E11 [0.0,0.0]

      Can someone please explain this exponential blow up in prediction, as per my understanding prediction in AFT is a prediction of the time when the failure event will occur, not able to understand why it will change exponentially against the value of the label.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              ashishchopra0308 Ashish Chopra
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: