Uploaded image for project: 'SystemDS'
  1. SystemDS
  2. SYSTEMDS-3024

Improve performance by batching data descriptor transfers

    XMLWordPrintableJSON

Details

    Description

      The spoof cuda operators do several little cudaMemcpy() invocations per operator execution. By transferring all data in one go the overhead can be reduced. In addition, using asynchronous copies can further improve things and are a first step towards using more asynchronicity in the GPU operations.

      Attachments

        Issue Links

          Activity

            People

              markd Mark Dokter
              markd Mark Dokter
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: