[SPARK-8525] Bug in Streaming k-means documentation - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 1.1.1, 1.2.2, 1.3.1, 1.4.1
Fix Version/s: 1.1.2, 1.2.3, 1.3.2, 1.4.1, 1.5.0
Component/s: Documentation, MLlib
Labels:
None

Description

The expected input format is wrong in Streaming K-means documentation.
https://spark.apache.org/docs/latest/mllib-clustering.html#streaming-k-means

It might be a bug in implementation though, not sure.

There shouldn't be any spaces in test data points. I.e. instead of
(y, [x1, x2, x3]) it should be
(y,[x1,x2,x3])

The exception thrown
org.apache.spark.SparkException: Cannot parse a double from:
at org.apache.spark.mllib.util.NumericParser$.parseDouble(NumericParser.scala:118)
at org.apache.spark.mllib.util.NumericParser$.parseTuple(NumericParser.scala:103)
at org.apache.spark.mllib.util.NumericParser$.parse(NumericParser.scala:41)
at org.apache.spark.mllib.regression.LabeledPoint$.parse(LabeledPoint.scala:49)

Also I would improve documentation saying explicitly that expected data types for both 'x' and 'y' is Double. At the moment it's not obvious especially for 'y'.

Attachments

Issue Links

links to

[Github] Pull Request #6954 (fe2s)

Activity

People

Assignee:: Oleksiy Dyagilev

Reporter:: Oleksiy Dyagilev

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 22/Jun/15 11:55

Updated:: 04/Jul/15 07:29

Resolved:: 23/Jun/15 20:12