Description
When running FPGrowth.fit() fromĀ ml package, one can see a warning:
WARN FPGrowth: Input data is not cached.
This warning occurs even the dataset of transactions is cached.
Actually this warning comes from the FPGrowth implementation in old mllib package. New FPGrowth performs some transformations on the input data set of transactions and then passes it to the old FPGrowth - without caching. Hence the warning.
The problem looks similar to SPARK-18356
If you don't mind, I can push a similar fix:
// ml.FPGrowth val handlePersistence = dataset.storageLevel == StorageLevel.NONE if (handlePersistence) { // cache the data } // then call mllib.FPGrowth // finally unpersist the data