Details
Description
This is a ticket to track progress on developing streaming analyses in MLLib.
Many streaming applications benefit from or require fitting models online, where the parameters of a model (e.g. regression, clustering) are updated continually as new data arrive. This can be accomplished by incorporating MLLib algorithms into model-updating operations over DStreams. In some cases this can be achieved using existing updaters (e.g. those based on SGD), but in other cases will require custom update rules (e.g. for KMeans). The goal is to have streaming versions of many common algorithms, in particular regression, classification, clustering, and possibly dimensionality reduction.