Mahout
  1. Mahout
  2. MAHOUT-364

[GSOC] Proposal to implement Neural Network with backpropagation learning on Hadoop

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Won't Fix
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None

      Description

      Proposal Title: Implement Multi-Layer Perceptrons with backpropagation learning on Hadoop (addresses issue Mahout-342)

      Student Name: Zaid Md. Abdul Wahab Sheikh

      Student E-mail: (gmail id) sheikh.zaid

      I. Brief Description

      A feedforward neural network (NN) reveals several degrees of parallelism within it such as weight parallelism, node parallelism, network parallelism, layer parallelism and training parallelism. However network based parallelism requires fine-grained synchronization and communication and thus is not suitable for map/reduce based algorithms. On the other hand, training-set parallelism is coarse grained. This can be easily exploited on Hadoop which can split up the input among different mappers. Each of the mappers will then propagate the 'InputSplit' through their own copy of the complete neural network.
      The backpropagation algorithm will operate in batch mode. This is because updating a common set of parameters after each training example creates a bottleneck for parallelization. The overall error gradient vector calculation can be parallelized by calculating the gradients from each training vector in the Mapper, combining them to get partial batch gradients and then adding them in a reducer to get the overall batch gradient.
      In a similiar manner, error function evaluations during line searches (for the conjugate gradient and quasi-Newton algorithms) can be efficiently parallelized.
      Lastly, to avoid local minima in its error function, we can take advantage of training session parallelism to start multiple training sessions in parallel with different initial weights (simulated annealing).

      II. Detailed Proposal

      The most important step is to design the base neural network classes in such a way that other NN architectures like Hopfield nets, Boltzman machines, SOM etc can be easily implemented by deriving from these base classes. For that I propose to implement a set of core classes that correspond to basic neural network concepts like artificial neuron, neuron layer, neuron connections, weight, transfer function, input function, learning rule etc. This architecture is inspired from that of the opensource Neuroph neural network framework (http://imgur.com/gDIOe.jpg). This design of the base architecture allows for great flexibility in deriving newer NNs and learning rules. All that needs to be done is to derive from the NeuralNetwork class, provide the method for network creation, create a new training method by deriving from LearningRule, and then add that learning rule to the network during creation. In addition, the API is very intuitive and easy to understand (in comparision to other NN frameworks like Encog and JOONE).

      The approach to parallelization in Hadoop:

      In the Driver class:

      • The input parameters are read and the NeuralNetwork with a specified LearningRule (training algorithm) created.
      • Initial weight values are randomly generated and written to the FileSystem. If number of training sessions (for simulated annealing) is specified, multiple sets of initial weight values are generated.
      • Training is started by calling the NeuralNetwork's learn() method. For each iteration, every time the error gradient vector needs to be calculated, the method submits a Job where the input path to the training-set vectors and various key properties (like path to the stored weight values) are set. The gradient vectors calculated by the Reducers are written back to an output path in the FileSystem.
      • After the JobClient.runJob() returns, the gradient vectors are retrieved from the FileSystem and tested to see if the stopping criterion is satisfied. The weights are then updated, using the method implemented by the particular LearningRule. For line searches, each error function evaluation is again done by submitting a job.
      • The NN is trained in iterations until it converges.

      In the Mapper class:

      • Each Mapper is initialized using the configure method, the weights are retrieved and the complete NeuralNetwork created.
      • The map function then takes in the training vectors as key/value pairs (the key is ignored), runs them through the NN to calculate the outputs and backpropagates the errors to find out the error gradients. The error gradient vectors are then output as key/value pairs where all the keys are set to a common value, such as the training session number (for each training session, all keys in the outputs of all the mappers have to be identical).

      In the Combiner class:

      • Iterates through the all individual error gradient vectors output by the mapper (since they all have the same key) and adds them up to get a partial batch gradient.

      In the Reducer class:

      • There's a single reducer class that will combine all the partial gradients from the Mappers to get the overall batch gradient.
      • The final error gradient vector is written back to the FileSystem

      I propose to complete all of the following sub-tasks during GSoC 2010:

      Implementation of the Backpropagation algorithm:

      • Initialization of weights: using the Nguyen-Widrow algorithm to select the initial range of starting weight values.
      • Input, transfer and error functions: implement basic input functions like WeightedSum and transfer functions like Sigmoid, Gaussian, tanh and linear. Implement the sum-of-squares error function.
      • Optimization methods to update the weights: (a) Batch Gradient descent, with momentum and a variable learning rate method [2] (b) A Conjugate gradient method with Brent's line search.

      Improving generalization:

      • Validating the network to test for overfitting (Early stopping method)
      • Regularization (weight decay method)

      Create examples for:

      • Classification: using the Abalone Data Set from UCI Machine Learning Repository
      • Classification, Regression: Breast Cancer Wisconsin (Prognostic) Data Set

      If time permits, also implement:

      • Resilient Backpropagation (RPROP)

      III. Week Plan with list of deliverables

      • (Till May 23rd, community bonding period)
        Brainstorm with my mentor and the Apache Mahout community to come up with the most optimal design for an extensible Neural Network framework. Code prototypes to identify potential problems and/or investigate new solutions.
        Deliverable: A detailed report or design document on how to implement the basic Neural Network framework and the learning algorithms.
      • (May 24th, coding starts) Week 1:
        Deliverables: Basic Neural network classes (Neuron, Connection, Weight, Layer, LearningRule, NeuralNetwork etc) and the various input, transfer and error functions mentioned previously.
      • (May 31st) Week 2 and Week 3:
        Deliverable: Driver, Mapper, Combiner and Reducer classes with basic functonality to run a feedforward Neural Network on Hadoop (no training methods yet, weights are generated using Nguyen-Widrow algorithm).
      • (June 14th) Week 4:
        Deliverable: Backpropagation algorithm using standard Batch Gradient descent.
      • (June 21st) Week 5:
        Deliverables: Variable learning rate and momentum during Batch Gradient descent. Validation tests support. Do some big tests.
      • (June 28th) Week 6:
        Deliverable: Support for Early stopping and Regularization (weight decay) during training.
      • (July 5th) Week 7 and Week 8:
        Deliverable: Conjugate gradient method with Brent's line search algorithm.
      • (July 19th) Week 9:
        Deliverable: Write unit tests. Do bigger scale tests for both batch gradient descent and conjugate gradient method.
      • (July 26th) Week 10 and Week 11:
        Deliverable: 2 examples of classification and regression on real-world datasets from UCI Machine Learning Repository. More tests.
      • (August 9th, tentative 'pencils down' date) Week 12:
        Deliverable: Wind up the work. Scrub code. Improved documentation, tutorials (on the wiki) etc.
      • (August 16: Final evaluation)

      IV. Additional Information

      I am a final year Computer Science student at NIT Allahabad (India) graduating in May. For my final year project/thesis, I am working on Open Domain Question Answering. I participated in GSoC last year for the Apertium machine translation system (http://google-opensource.blogspot.com/2009/11/apertium-projects-first-google-summer.html). I am familiar with the three major opensource Neural Network frameworks in Java, JOONE, Encog and Neuroph since I have used them in past projects on fingerprint recognition and face recognition (during a summer course on image and speech processing). My research interests are machine learning and statistical natural language processing and I will be enrolling for a Ph.D. next semester(i.e. next fall) in the same institute.

      I have no specific time constraints throughout the GSoC period. I will devote a minimum of 6 hours everyday to GSoC.
      Time offset: UTC+5:30 (IST)

      V. References

      [1] Fast parallel off-line training of multilayer perceptrons, S McLoone, GW Irwin - IEEE Transactions on Neural Networks, 1997
      [2] Optimization of the backpropagation algorithm for training multilayer perceptrons, W. Schiffmann, M. Joost and R. Werner, 1994
      [3] Map-Reduce for Machine Learning on Multicore, Cheng T. Chu, Sang K. Kim, Yi A. Lin, et al - in NIPS, 2006
      [4] Neural networks for pattern recognition, CM Bishop - 1995 [BOOK]

        Activity

        Sean Owen made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Hide
        Zoran Sevarac added a comment -

        Encog Switching to Apache License as of 2.5
        http://www.heatonresearch.com/content/encog-switching-apache-license-25

        We have this on our TODO list, sorry for the delay it will be done. Other stuff kept us busy...

        Zoran

        Show
        Zoran Sevarac added a comment - Encog Switching to Apache License as of 2.5 http://www.heatonresearch.com/content/encog-switching-apache-license-25 We have this on our TODO list, sorry for the delay it will be done. Other stuff kept us busy... Zoran
        Hide
        Zoran Sevarac added a comment -

        I have good news. We allready have some basic stuff to run neural network on Hadoop but it requires more development.
        There are some issues that should be analysed like synchronization (all nodes must complete in order to go to next iteration), and often the overhead of breaking up the task, for each iteration, becomes greater than the time savings from the multiple processing units. But these are the subject of research in this area.

        So please email me to let me know about the possible ways we (Neuroph and Encog) can join the programme and further development.
        I belive that we are able to deliver at least the prototype, which can be used for further research.

        Zoran

        Show
        Zoran Sevarac added a comment - I have good news. We allready have some basic stuff to run neural network on Hadoop but it requires more development. There are some issues that should be analysed like synchronization (all nodes must complete in order to go to next iteration), and often the overhead of breaking up the task, for each iteration, becomes greater than the time savings from the multiple processing units. But these are the subject of research in this area. So please email me to let me know about the possible ways we (Neuroph and Encog) can join the programme and further development. I belive that we are able to deliver at least the prototype, which can be used for further research. Zoran
        Hide
        Zoran Sevarac added a comment -

        Hi,

        The news is that Neuroph and Encog (two major open source Java neural network projects) joined their efforts,
        see http://netbeans.dzone.com/encog-neuroph-collaboration

        In short the basic idea is that Encog provide high performance core engine (which supports multicore and GPU) and Neuroph friendly API on top. I think that Encog also has some support for Hadoop but I'll check this.

        That means that maybe we allready have this feature, or we could focus development on this and provide it soon.
        I'll let you know more details soon.

        Zoran

        Show
        Zoran Sevarac added a comment - Hi, The news is that Neuroph and Encog (two major open source Java neural network projects) joined their efforts, see http://netbeans.dzone.com/encog-neuroph-collaboration In short the basic idea is that Encog provide high performance core engine (which supports multicore and GPU) and Neuroph friendly API on top. I think that Encog also has some support for Hadoop but I'll check this. That means that maybe we allready have this feature, or we could focus development on this and provide it soon. I'll let you know more details soon. Zoran
        Hide
        Ted Dunning added a comment -

        Correct.

        Definitely closable. It wouldn't hurt to reach out to the Neuroph guys, but I am not sure that the missions match well enough. In particular, I don't see that they would be happy with our limitation to scalable learning. They would be the ones to say that though.

        Show
        Ted Dunning added a comment - Correct. Definitely closable. It wouldn't hurt to reach out to the Neuroph guys, but I am not sure that the missions match well enough. In particular, I don't see that they would be happy with our limitation to scalable learning. They would be the ones to say that though.
        Sean Owen made changes -
        Status Reopened [ 4 ] Resolved [ 5 ]
        Resolution Won't Fix [ 2 ]
        Sean Owen made changes -
        Resolution Fixed [ 1 ]
        Status Resolved [ 5 ] Reopened [ 4 ]
        Sean Owen made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Hide
        Sean Owen added a comment -

        This didn't happen for GSoC right?

        Show
        Sean Owen added a comment - This didn't happen for GSoC right?
        Hide
        Zoran Sevarac added a comment -

        Hi.

        Just to let you that we've released the Neuroph 2.4 under the Apache 2 licence.

        Zoran

        Show
        Zoran Sevarac added a comment - Hi. Just to let you that we've released the Neuroph 2.4 under the Apache 2 licence. Zoran
        Hide
        Jake Mannix added a comment -

        Moving this discussion over to MAHOUT-383, as this is a Google Summer of Code JIRA ticket.

        Show
        Jake Mannix added a comment - Moving this discussion over to MAHOUT-383 , as this is a Google Summer of Code JIRA ticket.
        Hide
        Zoran Sevarac added a comment -

        Hi,

        We've discussed the switching to Apache license and integration with Mahout in Neuroph community and I'm very glad to say that we've agreed about license change and very interested in integration. So could you tell me how the integration would be done in practice?
        We would like to keep the existing Neuroph project running, and at the same time coordinate all the development with Mahout.

        Zoran

        Show
        Zoran Sevarac added a comment - Hi, We've discussed the switching to Apache license and integration with Mahout in Neuroph community and I'm very glad to say that we've agreed about license change and very interested in integration. So could you tell me how the integration would be done in practice? We would like to keep the existing Neuroph project running, and at the same time coordinate all the development with Mahout. Zoran
        Hide
        Jake Mannix added a comment -

        Zoran,

        Any form of BSD-style license is compatible (Mozilla, Eclipse, Apache public licenses, as well as of course BSD itself). No form of GPL (GPL v2, v3, LGPL, or AGPL) are compatible.

        If you can get your fellow contributors to agree to change the licensing to something non-viral (that's the issue with GPL: it requires that software which integrates with it also be GPL, while Apache/BSD/MPL/EPL all say, "Steal it if you want, just admit you did it!"), we'd love to get integration or even complete adoption/absorption of Neuroph.

        Can you talk it over with your compatriots regarding changing to something like the Apache License: http://www.apache.org/licenses/LICENSE-2.0 ?

        Show
        Jake Mannix added a comment - Zoran, Any form of BSD-style license is compatible (Mozilla, Eclipse, Apache public licenses, as well as of course BSD itself). No form of GPL (GPL v2, v3, LGPL, or AGPL) are compatible. If you can get your fellow contributors to agree to change the licensing to something non-viral (that's the issue with GPL: it requires that software which integrates with it also be GPL, while Apache/BSD/MPL/EPL all say, "Steal it if you want, just admit you did it!"), we'd love to get integration or even complete adoption/absorption of Neuroph. Can you talk it over with your compatriots regarding changing to something like the Apache License: http://www.apache.org/licenses/LICENSE-2.0 ?
        Hide
        Benson Margulies added a comment -

        GPL3 is NOT ASL compatible.

        Show
        Benson Margulies added a comment - GPL3 is NOT ASL compatible.
        Hide
        Ted Dunning added a comment -

        By the way is GPL3 Apache 2 compatible?

        No.

        Apache licenses allow any kind of reuse with only the mildest of conditions. They are similar to creative commons licenses that allow re-use with attribution.

        GPL licenses require that derived works also be GPL licensed and that source code be provide for all derived works.

        See here: http://www.opensource.org/licenses/gpl-3.0.html and here: http://www.opensource.org/licenses/apache2.0.php

        Show
        Ted Dunning added a comment - By the way is GPL3 Apache 2 compatible? No. Apache licenses allow any kind of reuse with only the mildest of conditions. They are similar to creative commons licenses that allow re-use with attribution. GPL licenses require that derived works also be GPL licensed and that source code be provide for all derived works. See here: http://www.opensource.org/licenses/gpl-3.0.html and here: http://www.opensource.org/licenses/apache2.0.php
        Hide
        Zoran Sevarac added a comment -

        @Jake

        Now I read more about the whole Mahout project and it sounds very interesting too. So I'm very interested in tighter interaction as you said, and if you find out that Neuroph can work for I'll be very glad if you decide to make it your ANN library. Of course we'll find some solution for licensing first. Maybe this proposal can be first common project.

        By the way is GPL3 Apache 2 compatible?

        Show
        Zoran Sevarac added a comment - @Jake Now I read more about the whole Mahout project and it sounds very interesting too. So I'm very interested in tighter interaction as you said, and if you find out that Neuroph can work for I'll be very glad if you decide to make it your ANN library. Of course we'll find some solution for licensing first. Maybe this proposal can be first common project. By the way is GPL3 Apache 2 compatible?
        Hide
        David Strupl added a comment -

        Back at school in late 90ties I did some experimenting with parallel implementation of backpropagation and other algorithms. Check for example
        http://portal.acm.org/author_page.cfm?id=81100013265&coll=GUIDE&dl=GUIDE&trk=0&CFID=85691215&CFTOKEN=64441042
        Sounds really interesting - all the best, David Strupl

        Show
        David Strupl added a comment - Back at school in late 90ties I did some experimenting with parallel implementation of backpropagation and other algorithms. Check for example http://portal.acm.org/author_page.cfm?id=81100013265&coll=GUIDE&dl=GUIDE&trk=0&CFID=85691215&CFTOKEN=64441042 Sounds really interesting - all the best, David Strupl
        Hide
        Zoran Sevarac added a comment -

        Sure, that wont be a problem. I must admit that I dont know much details about those licenses and essential differences between them, so I'll take a look. Our general approach is: if project will benefit from changing the license (which seems to be case here) then we're gonna do it. I'll appreciate any help with licensing options and we dont have to discuss it here. You can count that we'll find some solution.

        Show
        Zoran Sevarac added a comment - Sure, that wont be a problem. I must admit that I dont know much details about those licenses and essential differences between them, so I'll take a look. Our general approach is: if project will benefit from changing the license (which seems to be case here) then we're gonna do it. I'll appreciate any help with licensing options and we dont have to discuss it here. You can count that we'll find some solution.
        Hide
        Jake Mannix added a comment -

        Hi Zoran,

        Neuroph looks very interesting (although I haven't tried it out yet) - do you know if you and your fellow contributors are willing to change your licensing terms from LGPL to something more Apache compatible (like ASL or BSD)? As it is, Apache projects can't directly include and hook into LGPL'ed code (the reverse is obviously fine), but I'd love to see a tighter interaction here, given how little ANN code we have (and how much we'd like to have).

        Show
        Jake Mannix added a comment - Hi Zoran, Neuroph looks very interesting (although I haven't tried it out yet) - do you know if you and your fellow contributors are willing to change your licensing terms from LGPL to something more Apache compatible (like ASL or BSD)? As it is, Apache projects can't directly include and hook into LGPL'ed code (the reverse is obviously fine), but I'd love to see a tighter interaction here, given how little ANN code we have (and how much we'd like to have).
        Hide
        Zoran Sevarac added a comment -

        Hi,

        I'm Zoran Sevarac the Neuroph project http://neuroph.sourceforge.net founder and developer and I must say that I'm very happy to see this proposal. I agree that proposal is very well written and I like the idea very much. Although I dont have any experience with Hadoop, I'll be very glad to help with neural network related stuff.
        So I can say that the Neuroph and me personally will support and help with the development of this project if it gets accepted.

        I allready published short article about this http://netbeans.dzone.com/neuroph-hadoop-nb

        Show
        Zoran Sevarac added a comment - Hi, I'm Zoran Sevarac the Neuroph project http://neuroph.sourceforge.net founder and developer and I must say that I'm very happy to see this proposal. I agree that proposal is very well written and I like the idea very much. Although I dont have any experience with Hadoop, I'll be very glad to help with neural network related stuff. So I can say that the Neuroph and me personally will support and help with the development of this project if it gets accepted. I allready published short article about this http://netbeans.dzone.com/neuroph-hadoop-nb
        Hide
        Ted Dunning added a comment -

        This is a very nicely written proposal.

        One technical question I have is whether you will see gains in parallelism for training a single model. The experience with logistic regression makes this seem less likely.

        The dual level model structure that John Langford proposes in this lecture might be of interest: http://videolectures.net/nipsworkshops09_langford_pol/ He makes some inflammatory comments right off the bat that you might need to address.

        All that said, having a good implementation of an ANN learner is a good thing.

        Show
        Ted Dunning added a comment - This is a very nicely written proposal. One technical question I have is whether you will see gains in parallelism for training a single model. The experience with logistic regression makes this seem less likely. The dual level model structure that John Langford proposes in this lecture might be of interest: http://videolectures.net/nipsworkshops09_langford_pol/ He makes some inflammatory comments right off the bat that you might need to address. All that said, having a good implementation of an ANN learner is a good thing.
        Hide
        Jake Mannix added a comment -

        I've got to say, this is a fantastically well written proposal, with perfect breadth of scope as well.

        Do we have someone who can shepherd this?

        Show
        Jake Mannix added a comment - I've got to say, this is a fantastically well written proposal, with perfect breadth of scope as well. Do we have someone who can shepherd this?
        Zaid Md. Abdul Wahab Sheikh made changes -
        Comment [ formatting :( ]
        Zaid Md. Abdul Wahab Sheikh made changes -
        Field Original Value New Value
        Description Proposal Title: Implement Multi-Layer Perceptrons with backpropagation learning on Hadoop (addresses issue Mahout-342)

        Student Name: Zaid Md. Abdul Wahab Sheikh

        Student E-mail: (gmail id) sheikh.zaid



        I. Brief Description

        A feedforward neural network (NN) reveals several degrees of parallelism within it such as weight parallelism, node parallelism, network parallelism, layer parallelism and training parallelism. However network based parallelism requires fine-grained synchronization and communication and thus is not suitable for map/reduce based algorithms. On the other hand, training-set parallelism is coarse grained. This can be easily exploited on Hadoop which can split up the input among different mappers. Each of the mappers will then propagate the 'InputSplit' through their own copy of the complete neural network.
        The backpropagation algorithm will operate in batch mode. This is because updating a common set of parameters after each training example creates a bottleneck for parallelization. The overall error gradient vector calculation can be parallelized by calculating the gradients from each training vector in the Mapper, combining them to get partial batch gradients and then adding them in a reducer to get the overall batch gradient.
        In a similiar manner, error function evaluations during line searches (for the conjugate gradient and quasi-Newton algorithms) can be efficiently parallelized.
        Lastly, to avoid local minima in its error function, we can take advantage of training session parallelism to start multiple training sessions in parallel with different initial weights (simulated annealing).



        II. Detailed Proposal

        The most important step is to design the base neural network classes in such a way that other NN architectures like Hopfield nets, Boltzman machines, SOM etc can be easily implemented by deriving from these base classes. For that I propose to implement a set of core classes that correspond to basic neural network concepts like artificial neuron, neuron layer, neuron connections, weight, transfer function, input function, learning rule etc. This architecture is inspired from that of the opensource Neuroph neural network framework (http://imgur.com/gDIOe.jpg). This design of the base architecture allows for great flexibility in deriving newer NNs and learning rules. All that needs to be done is to derive from the NeuralNetwork class, provide the method for network creation, create a new training method by deriving from LearningRule, and then add that learning rule to the network during creation. In addition, the API is very intuitive and easy to understand (in comparision to other NN frameworks like Encog and JOONE).


        ** The approach to parallelization in Hadoop:

        In the Driver class:
        - The input parameters are read and the NeuralNetwork with a specified LearningRule (training algorithm) created.
        - Initial weight values are randomly generated and written to the FileSystem. If number of training sessions (for simulated annealing) is specified, multiple sets of initial weight values are generated.
        - Training is started by calling the NeuralNetwork's learn() method. For each iteration, every time the error gradient vector needs to be calculated, the method submits a Job where the input path to the training-set vectors and various key properties (like path to the stored weight values) are set. The gradient vectors calculated by the Reducers are written back to an output path in the FileSystem.
        - After the JobClient.runJob() returns, the gradient vectors are retrieved from the FileSystem and tested to see if the stopping criterion is satisfied. The weights are then updated, using the method implemented by the particular LearningRule. For line searches, each error function evaluation is again done by submitting a job.
        - The NN is trained in iterations until it converges.

        In the Mapper class:
        - Each Mapper is initialized using the configure method, the weights are retrieved and the complete NeuralNetwork created.
        - The map function then takes in the training vectors as key/value pairs (the key is ignored), runs them through the NN to calculate the outputs and backpropagates the errors to find out the error gradients. The error gradient vectors are then output as key/value pairs where all the keys are set to a common value, such as the training session number (for each training session, all keys in the outputs of all the mappers have to be identical).

        In the Combiner class:
        - Iterates through the all individual error gradient vectors output by the mapper (since they all have the same key) and adds them up to get a partial batch gradient.

        In the Reducer class:
        - There's a single reducer class that will combine all the partial gradients from the Mappers to get the overall batch gradient.
        - The final error gradient vector is written back to the FileSystem


        ** I propose to complete all of the following sub-tasks during GSoC 2010:

        Implementation of the Backpropagation algorithm:
        - Initialization of weights: using the Nguyen-Widrow algorithm to select the initial range of starting weight values.
        - Input, transfer and error functions: implement basic input functions like WeightedSum and transfer functions like Sigmoid, Gaussian, tanh and linear. Implement the sum-of-squares error function.
        - Optimization methods to update the weights: (a) Batch Gradient descent, with momentum and a variable learning rate method [2] (b) A Conjugate gradient method with Brent's line search.

        Improving generalization:
        - Validating the network to test for overfitting (Early stopping method)
        - Regularization (weight decay method)

        Create examples for:
        - Classification: using the Abalone Data Set from UCI Machine Learning Repository
        - Classification, Regression: Breast Cancer Wisconsin (Prognostic) Data Set

        If time permits, also implement:
        - Resilient Backpropagation (RPROP)



        V. Week Plan with list of deliverables


        (Till May 23rd, community bonding period)
        Brainstorm with my mentor and the Apache Mahout community to come up with the most optimal design for an extensible Neural Network framework. Code prototypes to identify potential problems and/or investigate new solutions.
        Deliverable: A detailed report or design document on how to implement the basic Neural Network framework and the learning algorithms.

        (May 24th, coding starts) Week 1:
        Deliverables: Basic Neural network classes (Neuron, Connection, Weight, Layer, LearningRule, NeuralNetwork etc) and the various input, transfer and error functions mentioned previously.

        (May 31st) Week 2 and Week 3:
        Deliverable: Driver, Mapper, Combiner and Reducer classes with basic functonality to run a feedforward Neural Network on Hadoop (no training methods yet, weights are generated using Nguyen-Widrow algorithm).

        (June 14th) Week 4:
        Deliverable: Backpropagation algorithm using standard Batch Gradient descent.
            
        (June 21st) Week 5:
        Deliverables: Variable learning rate and momentum during Batch Gradient descent. Validation tests support. Do some big tests.

        (June 28th) Week 6:
        Deliverable: Support for Early stopping and Regularization (weight decay) during training.

        (July 5th) Week 7 and Week 8:
        Deliverable: Conjugate gradient method with Brent's line search algorithm.

        (July 19th) Week 9:
        Deliverable: Write unit tests. Do bigger scale tests for both batch gradient descent and conjugate gradient method.

        (July 26th) Week 10 and Week 11:
        Deliverable: 2 examples of classification and regression on real-world datasets from UCI Machine Learning Repository. More tests.

        (August 9th, tentative 'pencils down' date) Week 12:
        Deliverable: Wind up the work. Scrub code. Improved documentation, tutorials (on the wiki) etc.

        (August 16: Final evaluation)



        VI. Additional Information

        I am a final year Computer Science student at NIT Allahabad (India) graduating in May. For my final year project/thesis, I am working on Open Domain Question Answering. I participated in GSoC last year for the Apertium machine translation system (http://google-opensource.blogspot.com/2009/11/apertium-projects-first-google-summer.html). I am familiar with the three major opensource Neural Network frameworks in Java, JOONE, Encog and Neuroph since I have used them in past projects on fingerprint recognition and face recognition (during a summer course on image and speech processing). My research interests are machine learning and statistical natural language processing and I will be enrolling for a Ph.D. next semester(i.e. next fall) in the same institute.

        I have no specific time constraints throughout the GSoC period. I will devote a minimum of 6 hours everyday to GSoC.
        Time offset: UTC+5:30 (IST)



        VII. References
        [1] Fast parallel off-line training of multilayer perceptrons, S McLoone, GW Irwin - IEEE Transactions on Neural Networks, 1997
        [2] Optimization of the backpropagation algorithm for training multilayer perceptrons, W. Schiffmann, M. Joost and R. Werner, 1994
        [3] Map-Reduce for Machine Learning on Multicore, Cheng T. Chu, Sang K. Kim, Yi A. Lin, et al - in NIPS, 2006
        [4] Neural networks for pattern recognition, CM Bishop - 1995 [BOOK]
        Proposal Title: Implement Multi-Layer Perceptrons with backpropagation learning on Hadoop (addresses issue Mahout-342)

        Student Name: Zaid Md. Abdul Wahab Sheikh

        Student E-mail: (gmail id) sheikh.zaid



        h2. I. Brief Description

        A feedforward neural network (NN) reveals several degrees of parallelism within it such as weight parallelism, node parallelism, network parallelism, layer parallelism and training parallelism. However network based parallelism requires fine-grained synchronization and communication and thus is not suitable for map/reduce based algorithms. On the other hand, training-set parallelism is coarse grained. This can be easily exploited on Hadoop which can split up the input among different mappers. Each of the mappers will then propagate the 'InputSplit' through their own copy of the complete neural network.
        The backpropagation algorithm will operate in batch mode. This is because updating a common set of parameters after each training example creates a bottleneck for parallelization. The overall error gradient vector calculation can be parallelized by calculating the gradients from each training vector in the Mapper, combining them to get partial batch gradients and then adding them in a reducer to get the overall batch gradient.
        In a similiar manner, error function evaluations during line searches (for the conjugate gradient and quasi-Newton algorithms) can be efficiently parallelized.
        Lastly, to avoid local minima in its error function, we can take advantage of training session parallelism to start multiple training sessions in parallel with different initial weights (simulated annealing).



        h2. II. Detailed Proposal

        The most important step is to design the base neural network classes in such a way that other NN architectures like Hopfield nets, Boltzman machines, SOM etc can be easily implemented by deriving from these base classes. For that I propose to implement a set of core classes that correspond to basic neural network concepts like artificial neuron, neuron layer, neuron connections, weight, transfer function, input function, learning rule etc. This architecture is inspired from that of the opensource Neuroph neural network framework (http://imgur.com/gDIOe.jpg). This design of the base architecture allows for great flexibility in deriving newer NNs and learning rules. All that needs to be done is to derive from the NeuralNetwork class, provide the method for network creation, create a new training method by deriving from LearningRule, and then add that learning rule to the network during creation. In addition, the API is very intuitive and easy to understand (in comparision to other NN frameworks like Encog and JOONE).


        h3. The approach to parallelization in Hadoop:

        In the Driver class:
        - The input parameters are read and the NeuralNetwork with a specified LearningRule (training algorithm) created.
        - Initial weight values are randomly generated and written to the FileSystem. If number of training sessions (for simulated annealing) is specified, multiple sets of initial weight values are generated.
        - Training is started by calling the NeuralNetwork's learn() method. For each iteration, every time the error gradient vector needs to be calculated, the method submits a Job where the input path to the training-set vectors and various key properties (like path to the stored weight values) are set. The gradient vectors calculated by the Reducers are written back to an output path in the FileSystem.
        - After the JobClient.runJob() returns, the gradient vectors are retrieved from the FileSystem and tested to see if the stopping criterion is satisfied. The weights are then updated, using the method implemented by the particular LearningRule. For line searches, each error function evaluation is again done by submitting a job.
        - The NN is trained in iterations until it converges.

        In the Mapper class:
        - Each Mapper is initialized using the configure method, the weights are retrieved and the complete NeuralNetwork created.
        - The map function then takes in the training vectors as key/value pairs (the key is ignored), runs them through the NN to calculate the outputs and backpropagates the errors to find out the error gradients. The error gradient vectors are then output as key/value pairs where all the keys are set to a common value, such as the training session number (for each training session, all keys in the outputs of all the mappers have to be identical).

        In the Combiner class:
        - Iterates through the all individual error gradient vectors output by the mapper (since they all have the same key) and adds them up to get a partial batch gradient.

        In the Reducer class:
        - There's a single reducer class that will combine all the partial gradients from the Mappers to get the overall batch gradient.
        - The final error gradient vector is written back to the FileSystem


        h3. I propose to complete all of the following sub-tasks during GSoC 2010:

        Implementation of the Backpropagation algorithm:
        - Initialization of weights: using the Nguyen-Widrow algorithm to select the initial range of starting weight values.
        - Input, transfer and error functions: implement basic input functions like WeightedSum and transfer functions like Sigmoid, Gaussian, tanh and linear. Implement the sum-of-squares error function.
        - Optimization methods to update the weights: (a) Batch Gradient descent, with momentum and a variable learning rate method [2] (b) A Conjugate gradient method with Brent's line search.

        Improving generalization:
        - Validating the network to test for overfitting (Early stopping method)
        - Regularization (weight decay method)

        Create examples for:
        - Classification: using the Abalone Data Set from UCI Machine Learning Repository
        - Classification, Regression: Breast Cancer Wisconsin (Prognostic) Data Set

        If time permits, also implement:
        - Resilient Backpropagation (RPROP)



        h2. III. Week Plan with list of deliverables


        * (Till May 23rd, community bonding period)
        Brainstorm with my mentor and the Apache Mahout community to come up with the most optimal design for an extensible Neural Network framework. Code prototypes to identify potential problems and/or investigate new solutions.
        Deliverable: A detailed report or design document on how to implement the basic Neural Network framework and the learning algorithms.

        * (May 24th, coding starts) Week 1:
        Deliverables: Basic Neural network classes (Neuron, Connection, Weight, Layer, LearningRule, NeuralNetwork etc) and the various input, transfer and error functions mentioned previously.

        * (May 31st) Week 2 and Week 3:
        Deliverable: Driver, Mapper, Combiner and Reducer classes with basic functonality to run a feedforward Neural Network on Hadoop (no training methods yet, weights are generated using Nguyen-Widrow algorithm).

        * (June 14th) Week 4:
        Deliverable: Backpropagation algorithm using standard Batch Gradient descent.
            
        * (June 21st) Week 5:
        Deliverables: Variable learning rate and momentum during Batch Gradient descent. Validation tests support. Do some big tests.

        * (June 28th) Week 6:
        Deliverable: Support for Early stopping and Regularization (weight decay) during training.

        * (July 5th) Week 7 and Week 8:
        Deliverable: Conjugate gradient method with Brent's line search algorithm.

        * (July 19th) Week 9:
        Deliverable: Write unit tests. Do bigger scale tests for both batch gradient descent and conjugate gradient method.

        * (July 26th) Week 10 and Week 11:
        Deliverable: 2 examples of classification and regression on real-world datasets from UCI Machine Learning Repository. More tests.

        * (August 9th, tentative 'pencils down' date) Week 12:
        Deliverable: Wind up the work. Scrub code. Improved documentation, tutorials (on the wiki) etc.

        * (August 16: Final evaluation)



        h2. IV. Additional Information

        I am a final year Computer Science student at NIT Allahabad (India) graduating in May. For my final year project/thesis, I am working on Open Domain Question Answering. I participated in GSoC last year for the Apertium machine translation system (http://google-opensource.blogspot.com/2009/11/apertium-projects-first-google-summer.html). I am familiar with the three major opensource Neural Network frameworks in Java, JOONE, Encog and Neuroph since I have used them in past projects on fingerprint recognition and face recognition (during a summer course on image and speech processing). My research interests are machine learning and statistical natural language processing and I will be enrolling for a Ph.D. next semester(i.e. next fall) in the same institute.

        I have no specific time constraints throughout the GSoC period. I will devote a minimum of 6 hours everyday to GSoC.
        Time offset: UTC+5:30 (IST)



        h2. V. References
        [1] Fast parallel off-line training of multilayer perceptrons, S McLoone, GW Irwin - IEEE Transactions on Neural Networks, 1997
        [2] Optimization of the backpropagation algorithm for training multilayer perceptrons, W. Schiffmann, M. Joost and R. Werner, 1994
        [3] Map-Reduce for Machine Learning on Multicore, Cheng T. Chu, Sang K. Kim, Yi A. Lin, et al - in NIPS, 2006
        [4] Neural networks for pattern recognition, CM Bishop - 1995 [BOOK]
        Zaid Md. Abdul Wahab Sheikh created issue -

          People

          • Assignee:
            Unassigned
            Reporter:
            Zaid Md. Abdul Wahab Sheikh
          • Votes:
            3 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development