Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-11219

Make Parameter Description Format Consistent in PySpark.MLlib

    XMLWordPrintableJSON

Details

    • Documentation
    • Status: Resolved
    • Trivial
    • Resolution: Done
    • None
    • 2.0.0
    • Documentation, MLlib, PySpark
    • None

    Description

      There are several different formats for describing params in PySpark.MLlib, making it unclear what the preferred way to document is, i.e. vertical alignment vs single line.

      This is to agree on a format and make it consistent across PySpark.MLlib.

      Following the discussion in SPARK-10560, using 2 lines with an indentation is both readable and doesn't lead to changing many lines when adding/removing parameters. If the parameter uses a default value, put this in parenthesis in a new line under the description.

      Example:

      :param stepSize:
        Step size for each iteration of gradient descent.
        (default: 0.1)
      :param numIterations:
        Number of iterations run for each batch of data.
        (default: 50)
      

      Current State of Parameter Description Formating

      Classification

      • LogisticRegressionModel - single line descriptions, fix indentations
      • LogisticRegressionWithSGD - vertical alignment, sporatic default values
      • LogisticRegressionWithLBFGS - vertical alignment, sporatic default values
      • SVMModel - single line
      • SVMWithSGD - vertical alignment, sporatic default values
      • NaiveBayesModel - single line
      • NaiveBayes - single line

      Clustering

      • KMeansModel - missing param description
      • KMeans - missing param description and defaults
      • GaussianMixture - vertical align, incorrect default formatting
      • PowerIterationClustering - single line with wrapped indentation, missing defaults
      • StreamingKMeansModel - single line wrapped
      • StreamingKMeans - single line wrapped, missing defaults
      • LDAModel - single line
      • LDA - vertical align, mising some defaults

      FPM

      • FPGrowth - single line
      • PrefixSpan - single line, defaults values in backticks

      Recommendation

      • ALS - does not have param descriptions

      Regression

      • LabeledPoint - single line
      • LinearModel - single line
      • LinearRegressionWithSGD - vertical alignment
      • RidgeRegressionWithSGD - vertical align
      • IsotonicRegressionModel - single line
      • IsotonicRegression - single line, missing default

      Tree

      • DecisionTree - single line with vertical indentation, missing defaults
      • RandomForest - single line with wrapped indent, missing some defaults
      • GradientBoostedTrees - single line with wrapped indent

      NOTE
      This issue will just focus on model/algorithm descriptions, which are the largest source of inconsistent formatting
      evaluation.py, feature.py, random.py, utils.py - these supporting classes have param descriptions as single line, but are consistent so don't need to be changed

      Attachments

        Issue Links

          Activity

            People

              bryanc Bryan Cutler
              bryanc Bryan Cutler
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - 5h
                  5h
                  Remaining:
                  Remaining Estimate - 5h
                  5h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified