Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
SystemML 1.3, SystemDS 2.1, SystemDS 2.2
Description
Intermediate memory is not estimated in the SpoofFusedOp HOP. This affects the rowwise template (thread local temporary memory and memory needed for transpose of some side inputs and/or output) and the GPU version of cellwise full aggregate (an intermediate buffer is needed for full reduction in CUDA if the operation requires more than one thread block).
To correctly estimate the memory, information stored in the compiled operator is needed. Factoring out this information from the compiled operator will be fixed and documented in a separate Jira issue.