When saving a CrossValidatorModel with more than 3 subModels and loading again, a different amount of subModels is returned. It seems every time 3 subModels are returned.
With less than two submodels (so 2 folds) writing plainly fails.
Issue seems to be (but I am not so familiar with the scala/java side)
- python object is converted to scala/java
- in scala we save subModels until numFolds:
- numFolds is not available on the CrossValidatorModel in pyspark
- default numFolds is 3 so somehow it tries to save 3 subModels.
The first issue can be reproduced by following failing tests, where spark is a SparkSession and tmp_path is a (temporary) directory.
The second as follows (will fail writing):