Details
-
Bug
-
Status: Closed
-
Blocker
-
Resolution: Fixed
-
None
-
None
-
None
Description
Randomly during training of a model, the following error will occur. It appears that during the course of training, the characteristics of the intermediate matrices can change, and if one of them becomes sparse enough to fall into the "Ultra Sparse" category, an internal compiler error is encountered in which the true and expected number of non-zeros diverge.
Caused by: org.apache.sysml.api.mlcontext.MLContextException: Exception occurred while executing runtime program at org.apache.sysml.api.mlcontext.ScriptExecutor.executeRuntimeProgram(ScriptExecutor.java:377) at org.apache.sysml.api.mlcontext.ScriptExecutor.execute(ScriptExecutor.java:320) at org.apache.sysml.api.mlcontext.MLContext.execute(MLContext.java:287) ... 11 more Caused by: org.apache.sysml.runtime.DMLRuntimeException: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error in while program block generated from while statement block between lines 17 and 45 -- Error evaluating while program block at org.apache.sysml.runtime.controlprogram.Program.execute(Program.java:152) at org.apache.sysml.api.mlcontext.ScriptExecutor.executeRuntimeProgram(ScriptExecutor.java:375) ... 13 more Caused by: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error in while program block generated from while statement block between lines 17 and 45 -- Error evaluating while program block at org.apache.sysml.runtime.controlprogram.WhileProgramBlock.execute(WhileProgramBlock.java:181) at org.apache.sysml.runtime.controlprogram.Program.execute(Program.java:145) ... 14 more Caused by: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error in program block generated from statement block between lines 32 and 32 -- Error evaluating instruction: CP°extfunct°./mnist_lenet.dml°train°14°10°X·MATRIX·DOUBLE°Y·MATRIX·DOUBLE°X_val·MATRIX·DOUBLE°Y_val·MATRIX·DOUBLE°C·SCALAR·DOUBLE·false°Hin·SCALAR·DOUBLE·false°Win·SCALAR·DOUBLE·false°lr·SCALAR·DOUBLE·false°mu·SCALAR·DOUBLE·false°decay·SCALAR·DOUBLE·false°lambda·SCALAR·DOUBLE·false°50·SCALAR·INT·true°1·SCALAR·INT·true°iters·SCALAR·DOUBLE·false°Wc1°bc1°Wc2°bc2°Wc3°bc3°Wa1°ba1°Wa2°ba2 at org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:335) at org.apache.sysml.runtime.controlprogram.ProgramBlock.executeInstructions(ProgramBlock.java:224) at org.apache.sysml.runtime.controlprogram.ProgramBlock.execute(ProgramBlock.java:168) at org.apache.sysml.runtime.controlprogram.WhileProgramBlock.execute(WhileProgramBlock.java:169) ... 15 more Caused by: org.apache.sysml.runtime.DMLRuntimeException: error executing function ./mnist_lenet.dml::train at org.apache.sysml.runtime.instructions.cp.FunctionCallCPInstruction.processInstruction(FunctionCallCPInstruction.java:184) at org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:305) ... 18 more Caused by: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error in function program block generated from function statement block between lines 38 and 270 -- Error evaluating function program block at org.apache.sysml.runtime.controlprogram.FunctionProgramBlock.execute(FunctionProgramBlock.java:121) at org.apache.sysml.runtime.instructions.cp.FunctionCallCPInstruction.processInstruction(FunctionCallCPInstruction.java:177) ... 19 more Caused by: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error in for program block generated from for statement block between lines 131 and 269 -- Error evaluating for program block at org.apache.sysml.runtime.controlprogram.ForProgramBlock.execute(ForProgramBlock.java:162) at org.apache.sysml.runtime.controlprogram.FunctionProgramBlock.execute(FunctionProgramBlock.java:114) ... 20 more Caused by: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error in for program block generated from for statement block between lines 132 and 244 -- Error evaluating for program block at org.apache.sysml.runtime.controlprogram.ForProgramBlock.execute(ForProgramBlock.java:162) at org.apache.sysml.runtime.controlprogram.ForProgramBlock.execute(ForProgramBlock.java:150) ... 21 more Caused by: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error in program block generated from statement block between lines 157 and 217 -- Error evaluating instruction: CP°r'°outc3p·MATRIX·DOUBLE°_mVar1077501·MATRIX·DOUBLE°48 at org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:335) at org.apache.sysml.runtime.controlprogram.ProgramBlock.executeInstructions(ProgramBlock.java:224) at org.apache.sysml.runtime.controlprogram.ProgramBlock.execute(ProgramBlock.java:168) at org.apache.sysml.runtime.controlprogram.ForProgramBlock.execute(ForProgramBlock.java:150) ... 22 more Caused by: org.apache.sysml.runtime.controlprogram.caching.CacheException: Eviction to local path /tmp/systemml/_p6456_10.168.31.80//cache/cache000546482.dat (_mVar1077501) failed. at org.apache.sysml.runtime.controlprogram.caching.CacheableData.release(CacheableData.java:651) at org.apache.sysml.runtime.controlprogram.context.ExecutionContext.setMatrixOutput(ExecutionContext.java:426) at org.apache.sysml.runtime.instructions.cp.ReorgCPInstruction.processInstruction(ReorgCPInstruction.java:135) at org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:305) ... 25 more Caused by: java.io.IOException: Failed to serialize cache block. at org.apache.sysml.runtime.controlprogram.caching.ByteBuffer.serializeBlock(ByteBuffer.java:82) at org.apache.sysml.runtime.controlprogram.caching.LazyWriteBuffer.writeBlock(LazyWriteBuffer.java:113) at org.apache.sysml.runtime.controlprogram.caching.CacheableData.release(CacheableData.java:647) ... 28 more Caused by: java.io.IOException: Invalid number of serialized non-zeros: 842 (expected: 2044) at org.apache.sysml.runtime.matrix.data.MatrixBlock.writeSparseToUltraSparse(MatrixBlock.java:2208) at org.apache.sysml.runtime.matrix.data.MatrixBlock.write(MatrixBlock.java:2073) at org.apache.sysml.runtime.controlprogram.caching.ByteBuffer.serializeBlock(ByteBuffer.java:73) ... 30 more
Attachments
Issue Links
- is depended upon by
-
SYSTEMDS-1185 SystemML Breast Cancer Project
- Resolved
- is duplicated by
-
SYSTEMDS-1137 Mismatch between expected and actual nonzeros when writing sparse to ultrasparse matrixblock
- Closed
- is related to
-
SYSTEMDS-1520 Corrupted sparse matrix representations
- Closed
-
SYSTEMDS-1734 Spark reshape instruction creates incorrect outputs for sparse inputs
- Closed