Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-2372

Grouped Optimization/Learning

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Won't Fix
    • 1.0.1, 1.0.2, 1.1.0
    • None
    • MLlib
    • None

    Description

      The purpose of this patch is the enable MLLib to better handle scenarios where the user would want to do learning on multiple feature/label sets at the same time. Rather then schedule each learning task separately, this patch lets the user create a single RDD with an Int key to represent the 'group' sets of entries belong to.

      This patch establishing the GroupedOptimizer trait, for which GroupedGradientDescent has been implemented. This systems differs from the original Optimizer trait in that the original optimize method accepted RDD[(Int, Vector)] the new GroupedOptimizer accepts RDD[(Int, (Double, Vector))].
      The difference is that the GroupedOptimizer uses a 'group' ID key in the RDD to multiplex multiple optimization operations in the same RDD.

      This patch also establishes the GroupedGeneralizedLinearAlgorithm trait, for which the 'run' method has had the RDD[LabeledPoint] input replaced with RDD[(Int,LabeledPoint)].

      This patch also provides a unit test and utility to take the results of MLUtils.kFold and turn it into a single grouped RDD, ready for simultaneous learning.

      https://github.com/apache/spark/pull/1292

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            kellrott Kyle Ellrott
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment