[SPARK-11219] Make Parameter Description Format Consistent in PySpark.MLlib - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Documentation
Status: Resolved
Priority: Trivial
Resolution: Done
Affects Version/s: None
Fix Version/s: 2.0.0
Component/s: Documentation, MLlib, PySpark
Labels:
None

Description

There are several different formats for describing params in PySpark.MLlib, making it unclear what the preferred way to document is, i.e. vertical alignment vs single line.

This is to agree on a format and make it consistent across PySpark.MLlib.

Following the discussion in ~~SPARK-10560~~, using 2 lines with an indentation is both readable and doesn't lead to changing many lines when adding/removing parameters. If the parameter uses a default value, put this in parenthesis in a new line under the description.

Example:

:param stepSize:
  Step size for each iteration of gradient descent.
  (default: 0.1)
:param numIterations:
  Number of iterations run for each batch of data.
  (default: 50)

Current State of Parameter Description Formating

Classification

LogisticRegressionModel - single line descriptions, fix indentations
LogisticRegressionWithSGD - vertical alignment, sporatic default values
LogisticRegressionWithLBFGS - vertical alignment, sporatic default values
SVMModel - single line
SVMWithSGD - vertical alignment, sporatic default values
NaiveBayesModel - single line
NaiveBayes - single line

Clustering

KMeansModel - missing param description
KMeans - missing param description and defaults
GaussianMixture - vertical align, incorrect default formatting
PowerIterationClustering - single line with wrapped indentation, missing defaults
StreamingKMeansModel - single line wrapped
StreamingKMeans - single line wrapped, missing defaults
LDAModel - single line
LDA - vertical align, mising some defaults

FPM

FPGrowth - single line
PrefixSpan - single line, defaults values in backticks

Recommendation

ALS - does not have param descriptions

Regression

LabeledPoint - single line
LinearModel - single line
LinearRegressionWithSGD - vertical alignment
RidgeRegressionWithSGD - vertical align
IsotonicRegressionModel - single line
IsotonicRegression - single line, missing default

Tree

DecisionTree - single line with vertical indentation, missing defaults
RandomForest - single line with wrapped indent, missing some defaults
GradientBoostedTrees - single line with wrapped indent

NOTE
This issue will just focus on model/algorithm descriptions, which are the largest source of inconsistent formatting
evaluation.py, feature.py, random.py, utils.py - these supporting classes have param descriptions as single line, but are consistent so don't need to be changed