Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
2.2.0, 2.3.0
-
None
Description
Writing a dataset to LibSVM format raises an exception
java.util.NoSuchElementException: key not found: numFeatures
Happens only when the dataset was NOT read from a LibSVM format before (because otherwise numFeatures is in its metadata). Steps to reproduce:
import org.apache.spark.ml.linalg.Vectors val rawData = Seq((1.0, Vectors.sparse(3, Seq((0, 2.0), (1, 3.0)))), (4.0, Vectors.sparse(3, Seq((0, 5.0), (2, 6.0))))) val dfTemp = spark.sparkContext.parallelize(rawData).toDF("label", "features") dfTemp.coalesce(1).write.format("libsvm").save("...filename...")
PR with a fix and unit test is ready - see https://github.com/apache/spark/pull/18872.