Description
Takes a gradient function as input. At each iteration, we run stochastic gradient descent locally on each worker with a fraction (alpha) of the data points selected randomly and disjointly (i.e., we ensure that we touch all datapoints after at most 1/alpha iterations).