Uploaded image for project: 'SystemDS'
  1. SystemDS
  2. SYSTEMDS-1078

Ultra Sparse Invalid number of serialized non-zeros

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • None
    • SystemML 1.1
    • None
    • None

    Description

      Randomly during training of a model, the following error will occur. It appears that during the course of training, the characteristics of the intermediate matrices can change, and if one of them becomes sparse enough to fall into the "Ultra Sparse" category, an internal compiler error is encountered in which the true and expected number of non-zeros diverge.

      Caused by: org.apache.sysml.api.mlcontext.MLContextException: Exception occurred while executing runtime program
      	at org.apache.sysml.api.mlcontext.ScriptExecutor.executeRuntimeProgram(ScriptExecutor.java:377)
      	at org.apache.sysml.api.mlcontext.ScriptExecutor.execute(ScriptExecutor.java:320)
      	at org.apache.sysml.api.mlcontext.MLContext.execute(MLContext.java:287)
      	... 11 more
      Caused by: org.apache.sysml.runtime.DMLRuntimeException: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error in while program block generated from while statement block between lines 17 and 45 -- Error evaluating while program block
      	at org.apache.sysml.runtime.controlprogram.Program.execute(Program.java:152)
      	at org.apache.sysml.api.mlcontext.ScriptExecutor.executeRuntimeProgram(ScriptExecutor.java:375)
      	... 13 more
      Caused by: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error in while program block generated from while statement block between lines 17 and 45 -- Error evaluating while program block
      	at org.apache.sysml.runtime.controlprogram.WhileProgramBlock.execute(WhileProgramBlock.java:181)
      	at org.apache.sysml.runtime.controlprogram.Program.execute(Program.java:145)
      	... 14 more
      Caused by: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error in program block generated from statement block between lines 32 and 32 -- Error evaluating instruction: CP°extfunct°./mnist_lenet.dml°train°14°10°X·MATRIX·DOUBLE°Y·MATRIX·DOUBLE°X_val·MATRIX·DOUBLE°Y_val·MATRIX·DOUBLE°C·SCALAR·DOUBLE·false°Hin·SCALAR·DOUBLE·false°Win·SCALAR·DOUBLE·false°lr·SCALAR·DOUBLE·false°mu·SCALAR·DOUBLE·false°decay·SCALAR·DOUBLE·false°lambda·SCALAR·DOUBLE·false°50·SCALAR·INT·true°1·SCALAR·INT·true°iters·SCALAR·DOUBLE·false°Wc1°bc1°Wc2°bc2°Wc3°bc3°Wa1°ba1°Wa2°ba2
      	at org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:335)
      	at org.apache.sysml.runtime.controlprogram.ProgramBlock.executeInstructions(ProgramBlock.java:224)
      	at org.apache.sysml.runtime.controlprogram.ProgramBlock.execute(ProgramBlock.java:168)
      	at org.apache.sysml.runtime.controlprogram.WhileProgramBlock.execute(WhileProgramBlock.java:169)
      	... 15 more
      Caused by: org.apache.sysml.runtime.DMLRuntimeException: error executing function ./mnist_lenet.dml::train
      	at org.apache.sysml.runtime.instructions.cp.FunctionCallCPInstruction.processInstruction(FunctionCallCPInstruction.java:184)
      	at org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:305)
      	... 18 more
      Caused by: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error in function program block generated from function statement block between lines 38 and 270 -- Error evaluating function program block
      	at org.apache.sysml.runtime.controlprogram.FunctionProgramBlock.execute(FunctionProgramBlock.java:121)
      	at org.apache.sysml.runtime.instructions.cp.FunctionCallCPInstruction.processInstruction(FunctionCallCPInstruction.java:177)
      	... 19 more
      Caused by: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error in for program block generated from for statement block between lines 131 and 269 -- Error evaluating for program block
      	at org.apache.sysml.runtime.controlprogram.ForProgramBlock.execute(ForProgramBlock.java:162)
      	at org.apache.sysml.runtime.controlprogram.FunctionProgramBlock.execute(FunctionProgramBlock.java:114)
      	... 20 more
      Caused by: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error in for program block generated from for statement block between lines 132 and 244 -- Error evaluating for program block
      	at org.apache.sysml.runtime.controlprogram.ForProgramBlock.execute(ForProgramBlock.java:162)
      	at org.apache.sysml.runtime.controlprogram.ForProgramBlock.execute(ForProgramBlock.java:150)
      	... 21 more
      Caused by: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error in program block generated from statement block between lines 157 and 217 -- Error evaluating instruction: CP°r'°outc3p·MATRIX·DOUBLE°_mVar1077501·MATRIX·DOUBLE°48
      	at org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:335)
      	at org.apache.sysml.runtime.controlprogram.ProgramBlock.executeInstructions(ProgramBlock.java:224)
      	at org.apache.sysml.runtime.controlprogram.ProgramBlock.execute(ProgramBlock.java:168)
      	at org.apache.sysml.runtime.controlprogram.ForProgramBlock.execute(ForProgramBlock.java:150)
      	... 22 more
      Caused by: org.apache.sysml.runtime.controlprogram.caching.CacheException: Eviction to local path /tmp/systemml/_p6456_10.168.31.80//cache/cache000546482.dat (_mVar1077501) failed.
      	at org.apache.sysml.runtime.controlprogram.caching.CacheableData.release(CacheableData.java:651)
      	at org.apache.sysml.runtime.controlprogram.context.ExecutionContext.setMatrixOutput(ExecutionContext.java:426)
      	at org.apache.sysml.runtime.instructions.cp.ReorgCPInstruction.processInstruction(ReorgCPInstruction.java:135)
      	at org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:305)
      	... 25 more
      Caused by: java.io.IOException: Failed to serialize cache block.
      	at org.apache.sysml.runtime.controlprogram.caching.ByteBuffer.serializeBlock(ByteBuffer.java:82)
      	at org.apache.sysml.runtime.controlprogram.caching.LazyWriteBuffer.writeBlock(LazyWriteBuffer.java:113)
      	at org.apache.sysml.runtime.controlprogram.caching.CacheableData.release(CacheableData.java:647)
      	... 28 more
      Caused by: java.io.IOException: Invalid number of serialized non-zeros: 842 (expected: 2044)
      	at org.apache.sysml.runtime.matrix.data.MatrixBlock.writeSparseToUltraSparse(MatrixBlock.java:2208)
      	at org.apache.sysml.runtime.matrix.data.MatrixBlock.write(MatrixBlock.java:2073)
      	at org.apache.sysml.runtime.controlprogram.caching.ByteBuffer.serializeBlock(ByteBuffer.java:73)
      	... 30 more
      

      Attachments

        Issue Links

          Activity

            People

              mboehm7 Matthias Boehm
              dusenberrymw Mike Dusenberry
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: