[SPARK-10629] Gradient boosted trees: mapPartitions input size increasing - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Duplicate
Affects Version/s: 1.4.1
Fix Version/s: None
Component/s: MLlib
Labels:
None

Description

First of all, I think my problem is quite different from https://issues.apache.org/jira/browse/SPARK-10433, which point that the input size increasing at each iteration.

My problem is the mapPartitions input size increase in one iteration. My training samples has 2958359 features in total. Within one iteration, 3 collectAsMap operation had been called. And here is a summary of each call.

Stage Id	Description	Duration	Input	Shuffle Read	Shuffle Write
:----------:	:---------------------------------------------------:	:-----------:	:-----------:	:----------------:	:----------------:
4	mapPartitions at DecisionTree.scala:613	1.6 h	710.2 MB		2.8 GB
5	collectAsMap at DecisionTree.scala:642	1.8 min		2.8 GB
6	mapPartitions at DecisionTree.scala:613	1.2 h	27.0 GB		5.6 GB
7	collectAsMap at DecisionTree.scala:642	2.0 min		5.6GB
8	mapPartitions at DecisionTree.scala:613	1.2 h	26.5 GB		11.1 GB
9	collectAsMap at DecisionTree.scala:642	2.0 min		8.3 GB

the mapPartitions operation took too long time! It's so strange! I wonder whether there is bug exits?

Attachments

Issue Links

duplicates

SPARK-10433 Gradient boosted trees: increasing input size in 1.4

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Wenmin Wu

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 16/Sep/15 02:16

Updated:: 19/Sep/15 07:19

Resolved:: 19/Sep/15 07:19