Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-30543

RandomForest add Param bootstrap to control sampling method

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Resolved
    • 3.0.0
    • None
    • ML, PySpark
    • None

    Description

      Current RF with numTrees=1 will directly build a tree using the orignial dataset,

      while with numTrees>1 it will use bootstrap samples to build trees.

      This design is to train a DecisionTreeModel by the impl of RandomForest, however, it is somewhat strange.

      In Scikit-Learn, there is a param bootstrap to control bootstrap samples are used.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            podongfeng Ruifeng Zheng
            podongfeng Ruifeng Zheng
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment