There are multiple problems (not only bugs) with the GradientMachine (based on Ted's revised version). If there is not time to pay attention to this issue, please ignore it until next week (when 0.8 is released).
1) The GradientMachine is a special case of MultiLayerPerceptron (MLP) that contains only 1 hidden layer. Is it necessary to have it if the MultiLayerPerceptron is in the plan?
2) The hiddenToOutput seems not correct. The squashing(activation) function should also apply to the output layer (See [1][2][3][4]). Therefore, the range of the output for each node(neuron) in the output is (0, 1) if Sigmoid function is used, or (-1, 1) if Tanh function is used.
3) There are several problems with the training method. In updateRanking, I don't know which weight update strategy is used, it claims it is back-propagation, but it is not implemented in that way.
3.1) It seems that only part of the outputWeight are updated (the weights associated with the good output node, and the weights associated with the worst output node. Again, this is OK for two-class problem).
For back-propagation, all the weights between the last hidden layer and the output layer should be updated. So, is the original designer intentionally design it like that and can guarantee its correctness?
In the backpropagation way, the delta of each node should be calculated first, and the weight of each node is adjusted based on the corresponding delta. However, in the implemented code,
3.2) The GradientMachine (and MLP) actually can also be used for regression and prediction. The 'train' method of OnlineLearner restricts its power.
4) The corresponding test case is not enough to test the correctness of the implementation.
5) If all the previous problems have been fixed, it is time to consider the necessity of a map-reduce version of the algorithm.
Reference:
[1] Tom Mitchel. Machine Learning. Chapter 4.
[2] Jiawei Han. Data Mining Concepts and Technologies. Chapter 6.
[3] Stanford Unsupervised Feature Learning and Deep Learning tutorial. http://ufldl.stanford.edu/wiki/index.php/Neural_Networks. Section Neural Network.
[4] Christopher Bishop. Neural Networks for Pattern Recognition. Chapter 4.
Patch for Mahout-975