Uploaded image for project: 'Apache MXNet (Retired)'
  1. Apache MXNet (Retired)
  2. MXNET-97

implement DepthwiseConv2dBackwardFilterKernel from tensorflow codebase

    XMLWordPrintableJSON

Details

    Description

      The current mxnet implementation calls __syncthreads() function too much, which is extemely slow.
      The new code comes from tensorflow, but the variable names are adjusted for consistency.

      My model uses depthwise conv heavily, and now its training time per iteration is over 5x faster on single P40 gpu. ( old 92s vs new 18s )

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              nihui Ni Hui
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 40m
                  40m