Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-30144

MLP param map missing

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 2.4.4
    • Fix Version/s: 3.0.0
    • Component/s: ML
    • Labels:
    • Docs Text:
      Hide
      From 3.0, MultilayerPerceptronClassificationModel extends MultilayerPerceptronParams to expose the training params. As a result,
      layers in MultilayerPerceptronClassificationModel has been changed from Array[Int] to IntArrayParam. User should use MultilayerPerceptronClassificationModel.getLayers instead of MultilayerPerceptronClassificationModel.layers to retrieve the size of layers.
      Show
      From 3.0, MultilayerPerceptronClassificationModel extends MultilayerPerceptronParams to expose the training params. As a result, layers in MultilayerPerceptronClassificationModel has been changed from Array[Int] to IntArrayParam. User should use MultilayerPerceptronClassificationModel.getLayers instead of MultilayerPerceptronClassificationModel.layers to retrieve the size of layers.

      Description

      Param maps for fitted classifiers are available with all classifiers except for the MultilayerPerceptronClassifier.
       
      There is no way to track or know what parameters were best during a crossvalidation or which parameters were used for submodels.
       

      {
      Param(parent='MultilayerPerceptronClassifier_eeab0cc242d1', name='featuresCol', doc='features column name'): 'features', 
      Param(parent='MultilayerPerceptronClassifier_eeab0cc242d1', name='labelCol', doc='label column name'): 'fake_banknote', 
      Param(parent='MultilayerPerceptronClassifier_eeab0cc242d1', name='predictionCol', doc='prediction column name'): 'prediction', 
      Param(parent='MultilayerPerceptronClassifier_eeab0cc242d1', name='probabilityCol', doc='Column name for predicted class conditional probabilities. Note: Not all models output well-calibrated probability estimates! These probabilities should be treated as confidences, not precise probabilities'): 'probability', 
      Param(parent='MultilayerPerceptronClassifier_eeab0cc242d1', name='rawPredictionCol', doc='raw prediction (a.k.a. confidence) column name'): 'rawPrediction'}

       
      GBTClassifier for example shows all parameters:
       

        {
      Param(parent='GBTClassifier_a0e77b3430aa', name='cacheNodeIds', doc='If false, the algorithm will pass trees to executors to match instances with nodes. If true, the algorithm will cache node IDs for each instance. Caching can speed up training of deeper trees.'): False, 
      Param(parent='GBTClassifier_a0e77b3430aa', name='checkpointInterval', doc='set checkpoint interval (>= 1) or disable checkpoint (-1). E.g. 10 means that the cache will get checkpointed every 10 iterations. Note: this setting will be ignored if the checkpoint directory is not set in the SparkContext'): 10, 
      Param(parent='GBTClassifier_a0e77b3430aa', name='featureSubsetStrategy', doc='The number of features to consider for splits at each tree node. Supported options: auto, all, onethird, sqrt, log2, (0.0-1.0], [1-n].'): 'all', 
      Param(parent='GBTClassifier_a0e77b3430aa', name='featuresCol', doc='features column name'): 'features', 
      Param(parent='GBTClassifier_a0e77b3430aa', name='labelCol', doc='label column name'): 'fake_banknote', Param(parent='GBTClassifier_a0e77b3430aa', name='lossType', doc='Loss function which GBT tries to minimize (case-insensitive). Supported options: logistic'): 'logistic', 
      Param(parent='GBTClassifier_a0e77b3430aa', name='maxBins', doc='Max number of bins for discretizing continuous features. Must be >=2 and >= number of categories for any categorical feature.'): 8, 
      Param(parent='GBTClassifier_a0e77b3430aa', name='maxDepth', doc='Maximum depth of the tree. (>= 0) E.g., depth 0 means 1 leaf node; depth 1 means 1 internal node + 2 leaf nodes.'): 5, Param(parent='GBTClassifier_a0e77b3430aa', name='maxIter', doc='maximum number of iterations (>= 0)'): 20, 
      Param(parent='GBTClassifier_a0e77b3430aa', name='maxMemoryInMB', doc='Maximum memory in MB allocated to histogram aggregation.'): 256, 
      Param(parent='GBTClassifier_a0e77b3430aa', name='minInfoGain', doc='Minimum information gain for a split to be considered at a tree node.'): 0.0, 
      Param(parent='GBTClassifier_a0e77b3430aa', name='minInstancesPerNode', doc='Minimum number of instances each child must have after split. If a split causes the left or right child to have fewer than minInstancesPerNode, the split will be discarded as invalid. Should be >= 1.'): 1, 
      Param(parent='GBTClassifier_a0e77b3430aa', name='predictionCol', doc='prediction column name'): 'prediction', 
      Param(parent='GBTClassifier_a0e77b3430aa', name='seed', doc='random seed'): 1234, 
      Param(parent='GBTClassifier_a0e77b3430aa', name='stepSize', doc='Step size (a.k.a. learning rate) in interval (0, 1] for shrinking the contribution of each estimator.'): 0.1, 
      Param(parent='GBTClassifier_a0e77b3430aa', name='subsamplingRate', doc='Fraction of the training data used for learning each decision tree, in range (0, 1].'): 1.0}

       
      See attached ipynb or example notebook here:

      https://colab.research.google.com/drive/1lwSHioZKlLh96FhGkdYFe6FUuRfTcSxH

        Attachments

        1. data_banknote_authentication.csv
          45 kB
          Glen-Erik Cortes
        2. MLP_params_missing.ipynb
          11 kB
          Glen-Erik Cortes

          Issue Links

            Activity

              People

              • Assignee:
                huaxingao Huaxin Gao
                Reporter:
                cyborgdroid Glen-Erik Cortes
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: