[SPARK-2138] The KMeans algorithm in the MLlib can lead to the Serialized Task size become bigger and bigger - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Not A Problem
Affects Version/s: 0.9.0, 0.9.1
Fix Version/s: None
Component/s: MLlib
Labels:
- clustering

Description

When the algorithm running at certain stage, when running the reduceBykey() function, It can lead to Executor Lost and Task lost, after several times. the application exit.

When this error occurred, the size of serialized task is bigger than 10MB, and the size become larger as the iteration increase.

the data generation file: https://gist.github.com/djvulee/7e3b2c9eb33ff0037622

the running code: https://gist.github.com/djvulee/6bf00e60885215e3bfd5

Attachments

Issue Links

relates to

SPARK-1112 When spark.akka.frameSize > 10, task results bigger than 10MiB block execution

Resolved

SPARK-3424 KMeans Plus Plus is too slow

Resolved

Activity

People

Assignee:: Xiangrui Meng

Reporter:: DjvuLee

Votes:: 1 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 13/Jun/14 12:38

Updated:: 24/Feb/15 01:55

Resolved:: 24/Feb/15 01:55