Details

      Description

      Implement a multi layer perceptron

      • via Matrix Multiplication
      • Learning by Backpropagation; implementing tricks by Yann LeCun et al.: "Efficent Backprop"
      • arbitrary number of hidden layers (also 0 - just the linear model)
      • connection between proximate layers only
      • different cost and activation functions (different activation function in each layer)
      • test of backprop by gradient checking
      • normalization of the inputs (storeable) as part of the model

      First:

      • implementation "stocastic gradient descent" like gradient machine
      • simple gradient descent incl. momentum

      Later (new jira issues):

      • Distributed Batch learning (see below)
      • "Stacked (Denoising) Autoencoder" - Feature Learning
      • advanced cost minimazation like 2nd order methods, conjugate gradient etc.

      Distribution of learning can be done by (batch learning):
      1 Partioning of the data in x chunks
      2 Learning the weight changes as matrices in each chunk
      3 Combining the matrixes and update of the weights - back to 2
      Maybe this procedure can be done with random parts of the chunks (distributed quasi online learning).
      Batch learning with delta-bar-delta heuristics for adapting the learning rates.

      1. MAHOUT-976.patch
        37 kB
        Christian Herta
      2. MAHOUT-976.patch
        30 kB
        Christian Herta
      3. MAHOUT-976.patch
        27 kB
        Christian Herta
      4. MAHOUT-976.patch
        19 kB
        Christian Herta

        Issue Links

          Activity

          Christian Herta created issue -
          Christian Herta made changes -
          Field Original Value New Value
          Original Estimate 336h [ 1209600 ] 80h [ 288000 ]
          Remaining Estimate 336h [ 1209600 ] 80h [ 288000 ]
          Christian Herta made changes -
          Description Implement a multi layer perceptron

           * via Matrix Multiplication
           * Learning by Backpropagation; implementing tricks by Yann LeCun et al.: "Efficent Backprop"
           * arbitrary number of hidden layers (also 0 - just the linear model)
           * connection between proximate layers only
           * different cost and activation functions (different activation function in each layer)
           * test of backprop by numerically gradient checking
           
          First:
           * implementation "stocastic gradient descent" like gradient machine

          Later (new jira issues):
           * Distributed Batch learning (see below)
           * "Stacked (Denoising) Autoencoder" - Feature Learning
             

          Distribution of learning can be done in batch learning by:
           1 Partioning of the data in x chunks
           2 Learning the weight changes as matrices in each chunk
           3 Combining the matrixes and update of the weights - back to 2
          Maybe is procedure can be done with random parts of the chunks (distributed quasi online learning)
          Implement a multi layer perceptron

           * via Matrix Multiplication
           * Learning by Backpropagation; implementing tricks by Yann LeCun et al.: "Efficent Backprop"
           * arbitrary number of hidden layers (also 0 - just the linear model)
           * connection between proximate layers only
           * different cost and activation functions (different activation function in each layer)
           * test of backprop by gradient checking
           
          First:
           * implementation "stocastic gradient descent" like gradient machine

          Later (new jira issues):
           * Distributed Batch learning (see below)
           * "Stacked (Denoising) Autoencoder" - Feature Learning
             

          Distribution of learning can be done in batch learning by:
           1 Partioning of the data in x chunks
           2 Learning the weight changes as matrices in each chunk
           3 Combining the matrixes and update of the weights - back to 2
          Maybe is procedure can be done with random parts of the chunks (distributed quasi online learning)
          Christian Herta made changes -
          Description Implement a multi layer perceptron

           * via Matrix Multiplication
           * Learning by Backpropagation; implementing tricks by Yann LeCun et al.: "Efficent Backprop"
           * arbitrary number of hidden layers (also 0 - just the linear model)
           * connection between proximate layers only
           * different cost and activation functions (different activation function in each layer)
           * test of backprop by gradient checking
           
          First:
           * implementation "stocastic gradient descent" like gradient machine

          Later (new jira issues):
           * Distributed Batch learning (see below)
           * "Stacked (Denoising) Autoencoder" - Feature Learning
             

          Distribution of learning can be done in batch learning by:
           1 Partioning of the data in x chunks
           2 Learning the weight changes as matrices in each chunk
           3 Combining the matrixes and update of the weights - back to 2
          Maybe is procedure can be done with random parts of the chunks (distributed quasi online learning)
          Implement a multi layer perceptron

           * via Matrix Multiplication
           * Learning by Backpropagation; implementing tricks by Yann LeCun et al.: "Efficent Backprop"
           * arbitrary number of hidden layers (also 0 - just the linear model)
           * connection between proximate layers only
           * different cost and activation functions (different activation function in each layer)
           * test of backprop by gradient checking
           
          First:
           * implementation "stocastic gradient descent" like gradient machine

          Later (new jira issues):
           * Distributed Batch learning (see below)
           * "Stacked (Denoising) Autoencoder" - Feature Learning
             

          Distribution of learning can be done in batch learning by:
           1 Partioning of the data in x chunks
           2 Learning the weight changes as matrices in each chunk
           3 Combining the matrixes and update of the weights - back to 2
          Maybe this procedure can be done with random parts of the chunks (distributed quasi online learning)
          Christian Herta made changes -
          Description Implement a multi layer perceptron

           * via Matrix Multiplication
           * Learning by Backpropagation; implementing tricks by Yann LeCun et al.: "Efficent Backprop"
           * arbitrary number of hidden layers (also 0 - just the linear model)
           * connection between proximate layers only
           * different cost and activation functions (different activation function in each layer)
           * test of backprop by gradient checking
           
          First:
           * implementation "stocastic gradient descent" like gradient machine

          Later (new jira issues):
           * Distributed Batch learning (see below)
           * "Stacked (Denoising) Autoencoder" - Feature Learning
             

          Distribution of learning can be done in batch learning by:
           1 Partioning of the data in x chunks
           2 Learning the weight changes as matrices in each chunk
           3 Combining the matrixes and update of the weights - back to 2
          Maybe this procedure can be done with random parts of the chunks (distributed quasi online learning)
          Implement a multi layer perceptron

           * via Matrix Multiplication
           * Learning by Backpropagation; implementing tricks by Yann LeCun et al.: "Efficent Backprop"
           * arbitrary number of hidden layers (also 0 - just the linear model)
           * connection between proximate layers only
           * different cost and activation functions (different activation function in each layer)
           * test of backprop by gradient checking
           
          First:
           * implementation "stocastic gradient descent" like gradient machine

          Later (new jira issues):
           * Distributed Batch learning (see below)
           * "Stacked (Denoising) Autoencoder" - Feature Learning
             

          Distribution of learning can be done by (batch learning):
           1 Partioning of the data in x chunks
           2 Learning the weight changes as matrices in each chunk
           3 Combining the matrixes and update of the weights - back to 2
          Maybe this procedure can be done with random parts of the chunks (distributed quasi online learning)
          Christian Herta made changes -
          Description Implement a multi layer perceptron

           * via Matrix Multiplication
           * Learning by Backpropagation; implementing tricks by Yann LeCun et al.: "Efficent Backprop"
           * arbitrary number of hidden layers (also 0 - just the linear model)
           * connection between proximate layers only
           * different cost and activation functions (different activation function in each layer)
           * test of backprop by gradient checking
           
          First:
           * implementation "stocastic gradient descent" like gradient machine

          Later (new jira issues):
           * Distributed Batch learning (see below)
           * "Stacked (Denoising) Autoencoder" - Feature Learning
             

          Distribution of learning can be done by (batch learning):
           1 Partioning of the data in x chunks
           2 Learning the weight changes as matrices in each chunk
           3 Combining the matrixes and update of the weights - back to 2
          Maybe this procedure can be done with random parts of the chunks (distributed quasi online learning)
          Implement a multi layer perceptron

           * via Matrix Multiplication
           * Learning by Backpropagation; implementing tricks by Yann LeCun et al.: "Efficent Backprop"
           * arbitrary number of hidden layers (also 0 - just the linear model)
           * connection between proximate layers only
           * different cost and activation functions (different activation function in each layer)
           * test of backprop by gradient checking
           
          First:
           * implementation "stocastic gradient descent" like gradient machine
           * simple gradient descent

          Later (new jira issues):
           * momentum for better and faster learning
           * advanced cost minimazation like 2nd order methods, conjugate gradient etc.
           * Distributed Batch learning (see below)
           * "Stacked (Denoising) Autoencoder" - Feature Learning
             

          Distribution of learning can be done by (batch learning):
           1 Partioning of the data in x chunks
           2 Learning the weight changes as matrices in each chunk
           3 Combining the matrixes and update of the weights - back to 2
          Maybe this procedure can be done with random parts of the chunks (distributed quasi online learning)
          Christian Herta made changes -
          Description Implement a multi layer perceptron

           * via Matrix Multiplication
           * Learning by Backpropagation; implementing tricks by Yann LeCun et al.: "Efficent Backprop"
           * arbitrary number of hidden layers (also 0 - just the linear model)
           * connection between proximate layers only
           * different cost and activation functions (different activation function in each layer)
           * test of backprop by gradient checking
           
          First:
           * implementation "stocastic gradient descent" like gradient machine
           * simple gradient descent

          Later (new jira issues):
           * momentum for better and faster learning
           * advanced cost minimazation like 2nd order methods, conjugate gradient etc.
           * Distributed Batch learning (see below)
           * "Stacked (Denoising) Autoencoder" - Feature Learning
             

          Distribution of learning can be done by (batch learning):
           1 Partioning of the data in x chunks
           2 Learning the weight changes as matrices in each chunk
           3 Combining the matrixes and update of the weights - back to 2
          Maybe this procedure can be done with random parts of the chunks (distributed quasi online learning)
          Implement a multi layer perceptron

           * via Matrix Multiplication
           * Learning by Backpropagation; implementing tricks by Yann LeCun et al.: "Efficent Backprop"
           * arbitrary number of hidden layers (also 0 - just the linear model)
           * connection between proximate layers only
           * different cost and activation functions (different activation function in each layer)
           * test of backprop by gradient checking
           
          First:
           * implementation "stocastic gradient descent" like gradient machine
           * simple gradient descent incl. momentum

          Later (new jira issues):
           * Distributed Batch learning (see below)
           * "Stacked (Denoising) Autoencoder" - Feature Learning
           * advanced cost minimazation like 2nd order methods, conjugate gradient etc.

          Distribution of learning can be done by (batch learning):
           1 Partioning of the data in x chunks
           2 Learning the weight changes as matrices in each chunk
           3 Combining the matrixes and update of the weights - back to 2
          Maybe this procedure can be done with random parts of the chunks (distributed quasi online learning).
          Batch learning with delta-bar-delta heuristics for adapting the learning rates.
           
          Christian Herta made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Christian Herta made changes -
          Attachment MAHOUT-976.patch [ 12514275 ]
          Christian Herta made changes -
          Comment [ uncomplete and completly untested
          should only compile
             ]
          Christian Herta made changes -
          Description Implement a multi layer perceptron

           * via Matrix Multiplication
           * Learning by Backpropagation; implementing tricks by Yann LeCun et al.: "Efficent Backprop"
           * arbitrary number of hidden layers (also 0 - just the linear model)
           * connection between proximate layers only
           * different cost and activation functions (different activation function in each layer)
           * test of backprop by gradient checking
           
          First:
           * implementation "stocastic gradient descent" like gradient machine
           * simple gradient descent incl. momentum

          Later (new jira issues):
           * Distributed Batch learning (see below)
           * "Stacked (Denoising) Autoencoder" - Feature Learning
           * advanced cost minimazation like 2nd order methods, conjugate gradient etc.

          Distribution of learning can be done by (batch learning):
           1 Partioning of the data in x chunks
           2 Learning the weight changes as matrices in each chunk
           3 Combining the matrixes and update of the weights - back to 2
          Maybe this procedure can be done with random parts of the chunks (distributed quasi online learning).
          Batch learning with delta-bar-delta heuristics for adapting the learning rates.
           
          Implement a multi layer perceptron

           * via Matrix Multiplication
           * Learning by Backpropagation; implementing tricks by Yann LeCun et al.: "Efficent Backprop"
           * arbitrary number of hidden layers (also 0 - just the linear model)
           * connection between proximate layers only
           * different cost and activation functions (different activation function in each layer)
           * test of backprop by gradient checking
           * normalization of the inputs (storeable) as part of the model
           
          First:
           * implementation "stocastic gradient descent" like gradient machine
           * simple gradient descent incl. momentum

          Later (new jira issues):
           * Distributed Batch learning (see below)
           * "Stacked (Denoising) Autoencoder" - Feature Learning
           * advanced cost minimazation like 2nd order methods, conjugate gradient etc.

          Distribution of learning can be done by (batch learning):
           1 Partioning of the data in x chunks
           2 Learning the weight changes as matrices in each chunk
           3 Combining the matrixes and update of the weights - back to 2
          Maybe this procedure can be done with random parts of the chunks (distributed quasi online learning).
          Batch learning with delta-bar-delta heuristics for adapting the learning rates.
           
          Christian Herta made changes -
          Attachment MAHOUT-976.patch [ 12514809 ]
          Christian Herta made changes -
          Attachment MAHOUT-976.patch [ 12515722 ]
          Christian Herta made changes -
          Attachment MAHOUT-976.patch [ 12516203 ]
          Robin Anil made changes -
          Assignee Ted Dunning [ tdunning ]
          Robin Anil made changes -
          Fix Version/s 0.8 [ 12320153 ]
          Robin Anil made changes -
          Fix Version/s Backlog [ 12318886 ]
          Fix Version/s 0.8 [ 12320153 ]
          Suneel Marthi made changes -
          Link This issue is superceded by MAHOUT-1265 [ MAHOUT-1265 ]
          Suneel Marthi made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Fix Version/s 0.9 [ 12324577 ]
          Resolution Duplicate [ 3 ]
          Suneel Marthi made changes -
          Fix Version/s Backlog [ 12318886 ]
          Suneel Marthi made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Assignee Ted Dunning [ tdunning ] Suneel Marthi [ smarthi ]

            People

            • Assignee:
              Suneel Marthi
              Reporter:
              Christian Herta
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - 80h
                80h
                Remaining:
                Remaining Estimate - 80h
                80h
                Logged:
                Time Spent - Not Specified
                Not Specified

                  Development