Uploaded image for project: 'Mahout'
  1. Mahout
  2. MAHOUT-945

The variance calculation of Random forest regression tree

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.6
    • 0.8
    • None

    Description

      Hi, Mukai
      Thanks for your efforts in expand the RF to regression. However, I have a doubt about your implementation regarding to Regressionsplit.java. The variance method
      "
      private static double variance(double[] s, double[] ss, double[] dataSize) {
      double var = 0;
      for (int i = 0; i < s.length; i++) {
      if (dataSize[i] > 0)

      { var += ss[i] - ((s[i] * s[i]) / dataSize[i]); }

      }
      return var;
      }
      "

      While the variance in my mind should be something like
      var += ss[i]/dataSize[i] - ((s[i] * s[i]) / (dataSize[i]*dataSize[i]));

      Please help correct me if I am wrong. Thanks

      Attachments

        1. MAHOUT-945.patch
          9 kB
          Ikumasa Mukai
        2. MAHOUT-945.patch
          12 kB
          Ikumasa Mukai
        3. MAHOUT-945.patch
          12 kB
          Ikumasa Mukai

        Issue Links

          Activity

            People

              srowen Sean R. Owen
              fayue1015 Wang Yue
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - 48h
                  48h
                  Remaining:
                  Remaining Estimate - 48h
                  48h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified