Details
Description
I am trying to use the GLM regression with a Tweedie distribution so I can model insurance use cases. I have set up a very simple example adapted from the docs:
def create_fake_losses_data(self): df = self._spark.createDataFrame([ ("a", 100.0, 12, 1, Vectors.dense(0.0, 0.0)), ("b", 0.0, 12, 1, Vectors.dense(1.0, 2.0)), ("c", 0.0, 12, 1, Vectors.dense(0.0, 0.0)), ("d", 2000.0, 12, 1, Vectors.dense(1.0, 1.0)), ], ["user", "label", "offset", "weight", "features"]) logging.info(df.collect()) setattr(self, 'fake_data', df) try: glr = GeneralizedLinearRegression( family="tweedie", variancePower=1.5, linkPower=-1, offsetCol='offset') glr.setRegParam(0.3) model = glr.fit(df) logging.info(model) except Py4JJavaError as e: print(e) return self
This causes the following error:
*py4j.protocol.Py4JJavaError: An error occurred while calling o99.toString.
: java.util.NoSuchElementException: Failed to find a default value for link*
at org.apache.spark.ml.param.Params.$anonfun$getOrDefault$2(params.scala:756)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.ml.param.Params.getOrDefault(params.scala:756)
at org.apache.spark.ml.param.Params.getOrDefault$(params.scala:753)
at org.apache.spark.ml.PipelineStage.getOrDefault(Pipeline.scala:41)
at org.apache.spark.ml.param.Params.$(params.scala:762)
at org.apache.spark.ml.param.Params.$$(params.scala:762)
at org.apache.spark.ml.PipelineStage.$(Pipeline.scala:41)
at org.apache.spark.ml.regression.GeneralizedLinearRegressionModel.toString(GeneralizedLinearRegression.scala:1117)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
I was under the assumption that the default value for link is None, if not defined otherwise.