Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
None
-
None
-
None
Description
While running any of the regression algorithms with gradient descent, the treeAggregate blows up after several iterations.
Observed on EC2 cluster with 16 nodes, matrix dimensions of 1,000,000 x 5,000
In order to replicate the problem, use aggregate multiple times, maybe over 50-60 times.
Testing lead to the possible workaround:
setting
`spark.cleaner.referenceTracking false`
seems to help. So the problem is most probably related to the cleanup.
Attachments
Issue Links
- Is contained by
-
SPARK-3015 Removing broadcast in quick successions causes Akka timeout
- Resolved