[SYSTEMDS-1159] Enable Remote Hyperparameter Tuning - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: SystemML 1.1
Fix Version/s: None
Component/s: None
Labels:
None

Epic Link:
Deep Learning

Description

Training a parameterized machine learning model (such as a large neural net in deep learning) requires learning a set of ideal model parameters from the data, as well as determining appropriate hyperparameters (or "settings") for the training process itself. In the latter case, the hyperparameters (i.e. learning rate, regularization strength, dropout percentage, model architecture, etc.) can not be learned from the data, and instead are determined via a search across a space for each hyperparameter. For large numbers of hyperparameters (such as in deep learning models), the current literature points to performing staged, randomized grid searches over the space to produce distributions of performance, narrowing the space after each search [1]. Thus, for efficient hyperparameter optimization, it is desirable to train several models in parallel, with each model trained over the full dataset. For deep learning models, a mini-batch training approach is currently state-of-the-art, and thus separate models with different hyperparameters could, conceivably, be easily trained on each of the nodes in a cluster.

In order to allow for the training of deep learning models, SystemML needs to determine a solution to enable this scenario with the Spark backend. Specifically, if the user has a train function that takes a set of hyperparameters and trains a model with a mini-batch approach (and thus is only making use of single-node instructions within the function), the user should be able to wrap this function with, for example, a remote parfor construct that samples hyperparameters and calls the train function on each machine in parallel.

To be clear, each model would need access to the entire dataset, and each model would be trained independently.

[1]: http://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf

Attachments

Issue Links

is depended upon by

SYSTEMDS-1185 SystemML Breast Cancer Project

Resolved

is related to

SYSTEMDS-1310 Parfor block partitioning (mini batches)

Closed

SYSTEMDS-979 Add support for bayesian optimization

Closed

SYSTEMDS-739 Explore model-parallel constructs in DML

Closed

relates to

SYSTEMDS-1129 Enable parfor to run on remote Spark workers

Open

Activity

People

Assignee:: Janardhan Pulivarthi

Reporter:: Mike Dusenberry

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 20/Dec/16 21:07

Updated:: 12/Nov/20 10:56