Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-7146

Should ML sharedParams be a public API?

Rank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Brainstorming
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.3.0
    • ML
    • None

    Description

      Proposal: Make most of the Param traits in sharedParams.scala public. Mark them as DeveloperApi.

      Pros:

      • Sharing the Param traits helps to encourage standardized Param names and documentation.

      Cons:

      • Users have to be careful since parameters can have different meanings for different algorithms.
      • If the shared Params are public, then implementations could test for the traits. It is unclear if we want users to rely on these traits, which are somewhat experimental.

      Currently, the shared params are private.

      UPDATED proposal

      • Some Params are clearly safe to make public. We will do so.
      • Some Params could be made public but may require caveats in the trait doc.
      • Some Params have turned out not to be shared in practice. We can move those Params to the classes which use them.

      Public shared params:

      • I/O column params
        • HasFeaturesCol
        • HasInputCol
        • HasInputCols
        • HasLabelCol
        • HasOutputCol
        • HasPredictionCol
        • HasProbabilityCol
        • HasRawPredictionCol
        • HasVarianceCol
        • HasWeightCol
      • Algorithm settings
        • HasCheckpointInterval
        • HasElasticNetParam
        • HasFitIntercept
        • HasMaxIter
        • HasRegParam
        • HasSeed
        • HasStandardization (less common)
        • HasStepSize
        • HasTol

      Questionable params:

      • HasHandleInvalid (only used in StringIndexer, but might be more widely used later on)
      • HasSolver (used in LinearRegression and GeneralizedLinearRegression, but same meaning as Optimizer in LDA)

      Params to be removed from sharedParams:

      • HasThreshold (only used in LogisticRegression)
      • HasThresholds (only used in ProbabilisticClassifier)

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            holden Holden Karau
            josephkb Joseph K. Bradley
            Votes:
            9 Vote for this issue
            Watchers:
            14 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment