Description
In current implementation, prediction for a node is calculated along with calculation of information gain stats for each possible splits. The value to predict for a specific node is determined, no matter what the splits are.
To save computation, we can first calculate prediction first and then calculate information gain stats for each split.
This is also necessary if we want to support minimum instances per node parameters(SPARK-2207) because when all splits don't satisfy minimum instances requirement , we don't use information gain of any splits. There should be a way to get the prediction value.