Details
-
Bug
-
Status: Closed
-
Blocker
-
Resolution: Fixed
-
SystemML 1.0.0, SystemML 1.1
-
None
-
None
-
Important
Description
We have identified two performance bugs that frequently occurs in deep learning script.
First, we repeatedly perform unnecessary conversion to sparse format. Also, the operations such as matrix multiplication (including BLAS and CuBLAS) are optimized for dense.
Second, even with large memory budget, we sometimes spend almost 20-30% time in caching.
mboehm7 reinwald mwdusenb@us.ibm.com I am labeling this bug as blocker for SystemML 1.0. Please feel free to assign this issue to yourself.
Improvements so far:
1. Disabled sparse conversions & caching, by commit
2. binary sparse-dense mult/div, preallocation by commit
3. For `conv_2d_bias_add`, the `elementWiseInPlaceTransposedAddition` method - first, aggreates partial blocks w/o transpose. secondly, does a cache conscious transpose to output. by commit
4. serialization overhead of sparse matrices(in MCSR) on bufferpool write, by using inMemorySize of cache block. by commit
5. removeEmpty(rows) or order perfomance improved by , shallow copy of sparse rows, exploiting the fact that removeEmpty(rows) and order do not modify the actual sparse rows. by commit
Attachments
Issue Links
- is depended upon by
-
SYSTEMDS-1185 SystemML Breast Cancer Project
- Resolved
- is related to
-
SYSTEMDS-1275 Remove workaround flags disable_sparse disable_caching
- Resolved
-
SYSTEMDS-1273 Performance spark right indexing w/o aggregation
- Resolved