[MXNET-1294] Priority-based parameter propagation for improved data parallel training throughput - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: To Do
Priority: Major
Resolution: Unresolved
Component/s: Apache MXNet Backend
Labels:
None

Description

This improvement is based on the ideas proposed in this SysML 2019 paper: https://anandj.in/wp-content/uploads/sysml.pdf

The key idea is, in a data parallel training scenario, synchronize parameters of each layer as packets of finer granularity based on its priority (defined by the layer index). This way of parameter synchronization provides better network utilization and thus improve the training throughput.

Attachments

Issue Links

links to

GitHub Pull Request #15124

GitHub Pull Request #15559

Activity

People

Assignee:: Unassigned

Reporter:: Anand J

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 10/Jan/19 22:12

Updated:: 02/Feb/20 00:05

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

11h