[SYSTEMDS-1160] Enable Prefetching of Mini-Batches - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: SystemML 1.0.0
Fix Version/s: None
Component/s: None
Labels:
None

Epic Link:
Deep Learning

Description

For efficient training of large deep learning models, a mini-batch training approach is preferred. On SystemML with the Spark backend, this currently equates to grabbing a mini-batch from an RDD (via a PartitionPruning RDD – see SYSTEMML-951), and then using entirely single-node instructions for each mini-batch. While the fetching of partitions has been made efficient, we currently have to pause after each training step to grab the next partition. For large models, training time is already an issue even for GPUs with saturated input pipelines. Thus, we need to enable prefetching of mini-batches that runs in parallel to the training loop. One possibility would be to create an input queue that is fed from a prefetch thread, and that then feeds the training loop.

Attachments

Issue Links

depends upon

SYSTEMDS-951 Efficient spark right indexing via lookup

In Progress

is depended upon by

SYSTEMDS-1185 SystemML Breast Cancer Project

Resolved

is related to

SYSTEMDS-1324 Program rewrite for mini batching

Open

Activity

People

Assignee:: Unassigned

Reporter:: Mike Dusenberry

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 20/Dec/16 21:25

Updated:: 12/Nov/20 10:56