Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
SystemDS 2.1
Description
The spoof cuda operators do several little cudaMemcpy() invocations per operator execution. By transferring all data in one go the overhead can be reduced. In addition, using asynchronous copies can further improve things and are a first step towards using more asynchronicity in the GPU operations.
Attachments
Issue Links
- links to