Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Not A Problem
-
1.6.1
-
None
-
None
-
Ubuntu 14.04
Description
Random Forest Regression
Data:https://www.kaggle.com/c/grupo-bimbo-inventory-demand/download/train.csv.zip
Parameters:
NumTrees:500 Maximum Bins:7477383 MaxDepth:27
MinInstancesPerNode:8648 SamplingRate:1.0
Java Options:
"-Xms16384M" "-Xmx16384M" "-Dspark.locality.wait=0s" "-Dspark.driver.extraJavaOptions=-Xss10240k -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:ParallelGCThreads=2 -XX:-UseAdaptiveSizePolicy -XX:ConcGCThreads=2 -XX:-UseGCOverheadLimit -XX:CMSInitiatingOccupancyFraction=75 -XX:NewSize=8g -XX:MaxNewSize=8g -XX:SurvivorRatio=3 -DnumPartitions=36" "-Dspark.submit.deployMode=cluster" "-Dspark.speculation=true" " "-Dspark.speculation.multiplier=2" "-Dspark.driver.memory=16g" "-Dspark.speculation.interval=300ms" "-Dspark.speculation.quantile=0.5" "-Dspark.akka.frameSize=768" "-Dspark.driver.supervise=false" "-Dspark.executor.cores=6" "-Dspark.executor.extraJavaOptions=-Xss10240k -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution -XX:-UseAdaptiveSizePolicy -XX:+UseParallelGC -XX:+UseParallelOldGC -XX:ParallelGCThreads=6 -XX:NewSize=22g -XX:MaxNewSize=22g -XX:SurvivorRatio=2 -XX:+PrintAdaptiveSizePolicy -XX:+PrintGCDateStamps" "-Dspark.rpc.askTimeout=10" "-Dspark.executor.memory=40g" "-Dspark.driver.maxResultSize=3g" "-Xss10240k" "-XX:+PrintGCDetails" "-XX:+PrintGCTimeStamps" "-XX:+PrintTenuringDistribution" "-XX:+UseConcMarkSweepGC" "-XX:+UseParNewGC" "-XX:ParallelGCThreads=2" "-XX:-UseAdaptiveSizePolicy" "-XX:ConcGCThreads=2" "-XX:-UseGCOverheadLimit" "-XX:CMSInitiatingOccupancyFraction=75" "-XX:NewSize=8g" "-XX:MaxNewSize=8g" "-XX:SurvivorRatio=3" "-DnumPartitions=36"
Partial Driver StackTrace:
org.apache.spark.rdd.PairRDDFunctions.collectAsMap(PairRDDFunctions.scala:740)
org.apache.spark.ml.tree.impl.RandomForest$.findBestSplits(RandomForest.scala:525)
org.apache.spark.ml.tree.impl.RandomForest$.run(RandomForest.scala:160)
org.apache.spark.ml.regression.CustomRandomForestRegressor.train(CustomRandomForestRegressor.scala:209)
org.apache.spark.ml.regression.CustomRandomForestRegressor.train(CustomRandomForestRegressor.scala:197)
org.apache.spark.ml.Predictor.fit(Predictor.scala:90)
org.apache.spark.ml.Predictor.fit(Predictor.scala:71)
org.apache.spark.ml.Estimator.fit(Estimator.scala:59)
org.apache.spark.ml.Estimator$$anonfun$fit$1.apply(Estimator.scala:78)
org.apache.spark.ml.Estimator$$anonfun$fit$1.apply(Estimator.scala:78)
For complete Executor and Driver ErrorLog
https://gist.github.com/anonymous/603ac7f8f17e43c51ba93b2934cd4cb6