Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
1.4.1
-
None
-
None
Description
First of all, I think my problem is quite different from https://issues.apache.org/jira/browse/SPARK-10433, which point that the input size increasing at each iteration.
My problem is the mapPartitions input size increase in one iteration. My training samples has 2958359 features in total. Within one iteration, 3 collectAsMap operation had been called. And here is a summary of each call.
Stage Id | Description | Duration | Input | Shuffle Read | Shuffle Write |
:----------: | :---------------------------------------------------: | :-----------: | :-----------: | :----------------: | :----------------: |
4 | mapPartitions at DecisionTree.scala:613 | 1.6 h | 710.2 MB | 2.8 GB | |
5 | collectAsMap at DecisionTree.scala:642 | 1.8 min | 2.8 GB | ||
6 | mapPartitions at DecisionTree.scala:613 | 1.2 h | 27.0 GB | 5.6 GB | |
7 | collectAsMap at DecisionTree.scala:642 | 2.0 min | 5.6GB | ||
8 | mapPartitions at DecisionTree.scala:613 | 1.2 h | 26.5 GB | 11.1 GB | |
9 | collectAsMap at DecisionTree.scala:642 | 2.0 min | 8.3 GB |
the mapPartitions operation took too long time! It's so strange! I wonder whether there is bug exits?
Attachments
Issue Links
- duplicates
-
SPARK-10433 Gradient boosted trees: increasing input size in 1.4
- Resolved