Implement a multi layer perceptron
- via Matrix Multiplication
- Learning by Backpropagation; implementing tricks by Yann LeCun et al.: "Efficent Backprop"
- arbitrary number of hidden layers (also 0 - just the linear model)
- connection between proximate layers only
- different cost and activation functions (different activation function in each layer)
- test of backprop by gradient checking
- normalization of the inputs (storeable) as part of the model
- implementation "stocastic gradient descent" like gradient machine
- simple gradient descent incl. momentum
Later (new jira issues):
- Distributed Batch learning (see below)
- "Stacked (Denoising) Autoencoder" - Feature Learning
- advanced cost minimazation like 2nd order methods, conjugate gradient etc.
Distribution of learning can be done by (batch learning):
1 Partioning of the data in x chunks
2 Learning the weight changes as matrices in each chunk
3 Combining the matrixes and update of the weights - back to 2
Maybe this procedure can be done with random parts of the chunks (distributed quasi online learning).
Batch learning with delta-bar-delta heuristics for adapting the learning rates.