Details
Description
Took the direct example from spark ml documentation.
training = spark.createDataFrame([ (1.218, 1.0, Vectors.dense(1.560, -0.605)), (2.949, 0.0, Vectors.dense(0.346, 2.158)), (3.627, 0.0, Vectors.dense(1.380, 0.231)), (0.273, 1.0, Vectors.dense(0.520, 1.151)), (4.199, 0.0, Vectors.dense(0.795, -0.226))], ["label", "censor", "features"]) quantileProbabilities = [0.3, 0.6] aft = AFTSurvivalRegression(quantileProbabilities=quantileProbabilities, quantilesCol="quantiles") #aft = AFTSurvivalRegression() model = aft.fit(training) # Print the coefficients, intercept and scale parameter for AFT survival regression print("Coefficients: " + str(model.coefficients)) print("Intercept: " + str(model.intercept)) print("Scale: " + str(model.scale)) model.transform(training).show(truncate=False)
result is:
Coefficients: [-0.496304411053,0.198452172529]
Intercept: 2.6380898963056327
Scale: 1.5472363533632303
label | censor | features | prediction | quantiles |
---|---|---|---|---|
1.218 | 1.0 | [1.56,-0.605] | 5.718985621018951 | [1.160322990805951,4.99546058340675] |
2.949 | 0.0 | [0.346,2.158] | 18.07678210850554 | [3.66759199449632,15.789837303662042] |
3.627 | 0.0 | [1.38,0.231] | 7.381908879359964 | [1.4977129086101573,6.4480027195054905] |
0.273 | 1.0 | [0.52,1.151] | 13.577717814884505 | [2.754778414791513,11.859962351993202] |
4.199 | 0.0 | [0.795,-0.226] | 9.013087597344805 | [1.828662187733188,7.8728164067854856] |
But if we change the value of all labels as label + 20. as:
training = spark.createDataFrame([ (21.218, 1.0, Vectors.dense(1.560, -0.605)), (22.949, 0.0, Vectors.dense(0.346, 2.158)), (23.627, 0.0, Vectors.dense(1.380, 0.231)), (20.273, 1.0, Vectors.dense(0.520, 1.151)), (24.199, 0.0, Vectors.dense(0.795, -0.226))], ["label", "censor", "features"]) quantileProbabilities = [0.3, 0.6] aft = AFTSurvivalRegression(quantileProbabilities=quantileProbabilities, quantilesCol="quantiles") #aft = AFTSurvivalRegression() model = aft.fit(training) # Print the coefficients, intercept and scale parameter for AFT survival regression print("Coefficients: " + str(model.coefficients)) print("Intercept: " + str(model.intercept)) print("Scale: " + str(model.scale)) model.transform(training).show(truncate=False)
result changes to:
Coefficients: [23.9932020748,3.18105314757]
Intercept: 7.35052273751137
Scale: 7698609960.724161
label | censor | features | prediction | quantiles |
---|---|---|---|---|
21.218 | 1.0 | [1.56,-0.605] | 4.0912442688237169E18 | [0.0,0.0] |
22.949 | 0.0 | [0.346,2.158] | 6.011158613411288E9 | [0.0,0.0] |
23.627 | 0.0 | [1.38,0.231] | 7.7835948690311181E17 | [0.0,0.0] |
20.273 | 1.0 | [0.52,1.151] | 1.5880852723124176E10 | [0.0,0.0] |
24.199 | 0.0 | [0.795,-0.226] | 1.4590190884193677E11 | [0.0,0.0] |
Can someone please explain this exponential blow up in prediction, as per my understanding prediction in AFT is a prediction of the time when the failure event will occur, not able to understand why it will change exponentially against the value of the label.
Attachments
Issue Links
- duplicates
-
SPARK-21523 Fix bug of strong wolfe linesearch `init` parameter lose effectiveness
- Resolved