Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
The following explain snippet shows that we currently inject unnecessary checkpoint (caching) directives after persistent reads of frames although they are directly fed into a transformencode. The cached frame blocks with lots of small string objects then cause excessive garbage collection.
----GENERIC (lines 1-22) [recompile=true] ------CP createvar pREADForig /data/criteo.csv/day_21 false FRAME csv -1 -1 -1 -1 copy false true 0.0 * ------CP createvar _fVar339 scratch_space//_p65649_129.27.206.4//_t0/temp122 true FRAME binary -1 -1 1000 -1 copy ------SPARK csvrblk pREADForig.FRAME.FP64.false _fVar339.FRAME.FP64 1000 false true 0.0 ------CP createvar _fVar340 scratch_space//_p65649_129.27.206.4//_t0/temp123 true FRAME binary -1 -1 1000 -1 copy ------SPARK chkpoint _fVar339.FRAME.FP64.false _fVar340.FRAME.FP64 MEMORY_AND_DISK ------CP rmvar _fVar339 ------CP mvvar _fVar340 Forig ----GENERIC (lines 1-22) [recompile=true] ------CP createvar X scratch_space//_p65649_129.27.206.4//_t0/X true MATRIX binary -1 -1 1000 -1 copy ------CP createvar M scratch_space//_p65649_129.27.206.4//_t0/M true FRAME binary -1 -1 1000 -1 copy ------SPARK transformencode Forig.FRAME.FP64.false { ids:true, recode:[15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40], bin:[{id:2, method:equi-width, numbins:10},{id:3, method:equi-width, numbins:10},{id:4, method:equi-width, numbins:10},{id:5, method:equi-width, numbins:10},{id:6, method:equi-width, numbins:10},{id:7, method:equi-width, numbins:10},{id:8, method:equi-width, numbins:10},{id:9, method:equi-width, numbins:10},{id:10, method:equi-width, numbins:10},{id:11, method:equi-width, numbins:10},{id:12, method:equi-width, numbins:10},{id:13, method:equi-width, numbins:10},{id:14, method:equi-width, numbins:10}]}.SCALAR.STRING.true X M ------CP rmvar Forig