Details
Description
Steps to reproduce :
from pyspark.mllib.clustering import GaussianMixture from numpy import array data = sc.textFile("data/mllib/gmm_data.txt") parsedData = data.map(lambda line: array([float(x) for x in line.strip().split(' ')])) gmm = GaussianMixture.train(parsedData, 2) GaussianMixture.train(parsedData, 2, initialModel=gmm)
It looks like the source of the problem is initialModelWeights NumPy array. In 1.5 / 1.6 it leads to net.razorvine.pickle.PickleException, in 1.4 we get Method trainGaussianMixture([..., class org.apache.spark.mllib.linalg.DenseVector, class java.util.ArrayList, class java.util.ArrayList]) does not exist