[SINGA-131] Implement and optimize hybrid training using both CPU and GPU - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Component/s: None
Labels:
- CPU
- GPU
- hybrid

Description

We discussed with researchers from Stanford on implementing hybrid training before
http://mail-archives.apache.org/mod_mbox/singa-dev/201507.mbox/%3CCAJz0iLsd5iSCqqVU4QHLKzMO2o%2BFt-40kN8RgWkYhDn%3D6Qqqbw%40mail.gmail.com%3E.
Now with the GPU training supported, we can move on to this feature.

The distributed training framework is natural for hybrid training with CPU and GPU. The first n workers would be assigned with GPU cards (n is the number of cards configured by users), and the rest workers would run on CPU.

Some code may need updates and optimization to consider the memory transferring between GPU workers and CPU workers. Most of them is in worker.cc, param.cc and stub.cc.

Automatically Tuning the workload among GPU and CPU could be designed and implemented in this ticket or a new ticket.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: wangwei

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 12/Jan/16 02:29

Updated:: 06/Oct/16 08:16

Resolved:: 06/Oct/16 08:16

Time Tracking

Estimated:

336h

Remaining:

336h

Logged:

Not Specified