[SPARK-2916] [MLlib] While running regression tests with dense vectors of length greater than 1000, the treeAggregate blows up after several iterations - ASF JIRA

Attach files

Attach Screenshot

Voters

Watch issue

Watchers

Create sub-task

Link

Clone

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Blocker
Resolution: Fixed
Affects Version/s: None
Fix Version/s: None
Component/s: MLlib, Spark Core
Labels:
None

Target Version/s:

1.1.0

Description

While running any of the regression algorithms with gradient descent, the treeAggregate blows up after several iterations.

Observed on EC2 cluster with 16 nodes, matrix dimensions of 1,000,000 x 5,000

In order to replicate the problem, use aggregate multiple times, maybe over 50-60 times.

Testing lead to the possible workaround:
setting
`spark.cleaner.referenceTracking false`

seems to help. So the problem is most probably related to the cleanup.

Attachments

Issue Links

Add Link

Is contained by

SPARK-3015 Removing broadcast in quick successions causes Akka timeout

Resolved

Delete this link

Activity

Comment

This comment will be Viewable by All Users Viewable by All Users

Cancel

People

Assignee:: Unassigned

Reporter:: Burak Yavuz

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 08/Aug/14 01:54

Updated:: 16/Aug/14 05:56

Resolved:: 16/Aug/14 05:56

Agile

View on Board

[MLlib] While running regression tests with dense vectors of length greater than 1000, the treeAggregate blows up after several iterations

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Agile

Slack

Issue deployment