Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-2776

Add normalizeByCol method to mllib.util.MLUtils

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Minor
    • Resolution: Duplicate
    • None
    • None
    • None
    • None

    Description

      Add the ability to compute the mean and standard deviations of each vector (LabeledPoint) component and normalize each vector in the RDD, using only RDD transformations. The result is an RDD of Vectors where each column has a mean of zero and standard deviation of one.

      See https://github.com/apache/spark/pull/1698

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              andy327 Andres Perez
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: