Details
-
Documentation
-
Status: Resolved
-
Trivial
-
Resolution: Done
-
None
-
None
Description
There are several different formats for describing params in PySpark.MLlib, making it unclear what the preferred way to document is, i.e. vertical alignment vs single line.
This is to agree on a format and make it consistent across PySpark.MLlib.
Following the discussion in SPARK-10560, using 2 lines with an indentation is both readable and doesn't lead to changing many lines when adding/removing parameters. If the parameter uses a default value, put this in parenthesis in a new line under the description.
Example:
:param stepSize: Step size for each iteration of gradient descent. (default: 0.1) :param numIterations: Number of iterations run for each batch of data. (default: 50)
Current State of Parameter Description Formating
Classification
- LogisticRegressionModel - single line descriptions, fix indentations
- LogisticRegressionWithSGD - vertical alignment, sporatic default values
- LogisticRegressionWithLBFGS - vertical alignment, sporatic default values
- SVMModel - single line
- SVMWithSGD - vertical alignment, sporatic default values
- NaiveBayesModel - single line
- NaiveBayes - single line
Clustering
- KMeansModel - missing param description
- KMeans - missing param description and defaults
- GaussianMixture - vertical align, incorrect default formatting
- PowerIterationClustering - single line with wrapped indentation, missing defaults
- StreamingKMeansModel - single line wrapped
- StreamingKMeans - single line wrapped, missing defaults
- LDAModel - single line
- LDA - vertical align, mising some defaults
FPM
- FPGrowth - single line
- PrefixSpan - single line, defaults values in backticks
Recommendation
- ALS - does not have param descriptions
Regression
- LabeledPoint - single line
- LinearModel - single line
- LinearRegressionWithSGD - vertical alignment
- RidgeRegressionWithSGD - vertical align
- IsotonicRegressionModel - single line
- IsotonicRegression - single line, missing default
Tree
- DecisionTree - single line with vertical indentation, missing defaults
- RandomForest - single line with wrapped indent, missing some defaults
- GradientBoostedTrees - single line with wrapped indent
NOTE
This issue will just focus on model/algorithm descriptions, which are the largest source of inconsistent formatting
evaluation.py, feature.py, random.py, utils.py - these supporting classes have param descriptions as single line, but are consistent so don't need to be changed
Attachments
Issue Links
- is blocked by
-
SPARK-10560 Make StreamingLogisticRegressionWithSGD Python API equals with Scala one
- Resolved