Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-10064

Decision tree continuous feature binning is slow in large feature spaces

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 1.4.1
    • 1.6.0
    • MLlib
    • None

    Description

      When working with large feature spaces and high bin counts (>500) the binning process can take many hours. This is particularly painful because it ties up executors for the duration, which is not shared-cluster friendly.

      The binning process can and should be performed on the executors instead of the driver.

      Attachments

        Issue Links

          Activity

            People

              NathanHowell Nathan Howell
              NathanHowell Nathan Howell
              Joseph K. Bradley Joseph K. Bradley
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: