Details
-
Sub-task
-
Status: Closed
-
Major
-
Resolution: Done
-
None
-
None
Description
This task aims to address suboptimal parfor optimizer choices for partitionable scenarios with large driver memory. Currently, we only apply partitioning, if the right indexing operation does not fit in memory of the driver or remote tasks. The execution type selection is then unaware of potential partitioning, and does not revert this decision - this is problematic, because the large input likely exceeds the memory budget of remote tasks, ultimately causing the optimizer to fall back to a local parfor with very small degree of parallelism k.
On our perftest 8GB Univariate stats scenario (with 20GB driver, i.e., 14GB memory budget), this lead to a local parfor with k=1 and thus, unnecessarily high execution time.
Total elapsed time: 781.233 sec. Total compilation time: 2.059 sec. Total execution time: 779.175 sec. Number of compiled Spark inst: 0. Number of executed Spark inst: 0. Cache hits (Mem, WB, FS, HDFS): 27904/0/0/2. Cache writes (WB, FS, HDFS): 3134/0/1. Cache times (ACQr/m, RLS, EXP): 9.200/0.022/0.301/0.300 sec. HOP DAGs recompiled (PRED, SB): 0/100. HOP DAGs recompile time: 0.238 sec. Spark ctx create time (lazy): 0.000 sec. Spark trans counts (par,bc,col):0/0/0. Spark trans times (par,bc,col): 0.000/0.000/0.000 secs. ParFor loops optimized: 1. ParFor optimize time: 1.985 sec. ParFor initialize time: 0.007 sec. ParFor result merge time: 0.003 sec. ParFor total update in-place: 0/0/13900 Total JIT compile time: 13.542 sec. Total JVM GC count: 29. Total JVM GC time: 3.49 sec. Heavy hitter instructions (name, time, count): -- 1) cm 479.000 sec 2700 -- 2) qsort 228.928 sec 900 -- 3) qpick 20.598 sec 1800 -- 4) rangeReIndex 16.051 sec 2999 -- 5) uamean 12.867 sec 900 -- 6) uacmax 9.870 sec 1 -- 7) ctable 3.158 sec 100 -- 8) uamin 2.589 sec 1000 -- 9) uamax 2.560 sec 1101 -- 10) write 0.300 sec 1