Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-1969

Public available online summarizer for mean, variance, min, and max

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Done
    • None
    • 1.1.0
    • MLlib
    • None

    Description

      It basically moved the private ColumnStatisticsAggregator class from RowMatrix to public available DeveloperApi.

      Changes:
      1) Moved the trait from org.apache.spark.mllib.stat.MultivariateStatisticalSummary to org.apache.spark.mllib.stats.Summarizer
      2) Moved the private implementation from org.apache.spark.mllib.linalg. ColumnStatisticsAggregator to org.apache.spark.mllib.stats.OnlineSummarizer
      3) When creating OnlineSummarizer object, the number of columns is not needed in the constructor. It's determined when users add the first sample.
      4) Added the API documentation for OnlineSummarizer
      5) Added the unittest for OnlineSummarizer

      Attachments

        Activity

          People

            dbtsai DB Tsai
            dbtsai DB Tsai
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: