Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-16719

RandomForest: communicate fewer trees on each iteration

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • None
    • 2.1.0
    • ML
    • None

    Description

      RandomForest currently sends the entire forest to each worker on each iteration. This is because (a) the node queue is FIFO and (b) the closure references the entire array of trees (topNodes). (a) causes RFs to handle splits in many trees, especially early on in learning. (b) sends all trees explicitly.

      Proposal:
      (a) Change the RF node queue to be FILO, so that RFs tend to focus on 1 or a few trees before focusing on others.
      (b) Change topNodes to pass only the trees required on that iteration.

      Attachments

        Issue Links

          Activity

            People

              josephkb Joseph K. Bradley
              josephkb Joseph K. Bradley
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: