Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.9
    • Component/s: None

      Description

      Design of multilayer perceptron

      1. Motivation
      A multilayer perceptron (MLP) is a kind of feed forward artificial neural network, which is a mathematical model inspired by the biological neural network. The multilayer perceptron can be used for various machine learning tasks such as classification and regression. It is helpful if it can be included in mahout.

      2. API

      The design goal of API is to facilitate the usage of MLP for user, and make the implementation detail user transparent.

      The following is an example code of how user uses the MLP.
      -------------------------------------
      // set the parameters
      double learningRate = 0.5;
      double momentum = 0.1;
      int[] layerSizeArray = new int[]

      {2, 5, 1}

      ;
      String costFuncName = “SquaredError”;
      String squashingFuncName = “Sigmoid”;
      // the location to store the model, if there is already an existing model at the specified location, MLP will throw exception
      URI modelLocation = ...
      MultilayerPerceptron mlp = new MultiLayerPerceptron(layerSizeArray, modelLocation);
      mlp.setLearningRate(learningRate).setMomentum(momentum).setRegularization(...).setCostFunction(...).setSquashingFunction(...);

      // the user can also load an existing model with given URI and update the model with new training data, if there is no existing model at the specified location, an exception will be thrown
      /*
      MultilayerPerceptron mlp = new MultiLayerPerceptron(learningRate, regularization, momentum, squashingFuncName, costFuncName, modelLocation);
      */

      URI trainingDataLocation = …
      // the detail of training is transparent to the user, it may running in a single machine or in a distributed environment
      mlp.train(trainingDataLocation);

      // user can also train the model with one training instance in stochastic gradient descent way
      Vector trainingInstance = ...
      mlp.train(trainingInstance);

      // prepare the input feature
      Vector inputFeature …
      // the semantic meaning of the output result is defined by the user
      // in general case, the dimension of output vector is 1 for regression and two-class classification
      // the dimension of output vector is n for n-class classification (n > 2)
      Vector outputVector = mlp.output(inputFeature);
      -------------------------------------

      3. Methodology

      The output calculation can be easily implemented with feed-forward approach. Also, the single machine training is straightforward. The following will describe how to train MLP in distributed way with batch gradient descent. The workflow is illustrated as the below figure.

      https://docs.google.com/drawings/d/1s8hiYKpdrP3epe1BzkrddIfShkxPrqSuQBH0NAawEM4/pub?w=960&h=720

      For the distributed training, each training iteration is divided into two steps, the weight update calculation step and the weight update step. The distributed MLP can only be trained in batch-update approach.

      3.1 The partial weight update calculation step:
      This step trains the MLP distributedly. Each task will get a copy of the MLP model, and calculate the weight update with a partition of data.

      Suppose the training error is E(w) = ½ \sigma_

      {d \in D}

      cost(t_d, y_d), where D denotes the training set, d denotes a training instance, t_d denotes the class label and y_d denotes the output of the MLP. Also, suppose sigmoid function is used as the squashing function,
      squared error is used as the cost function,
      t_i denotes the target value for the ith dimension of the output layer,
      o_i denotes the actual output for the ith dimension of the output layer,
      l denotes the learning rate,
      w_

      {ij} denotes the weight between the jth neuron in previous layer and the ith neuron in the next layer.

      The weight of each edge is updated as

      \Delta w_{ij}

      = l * 1 / m * \delta_j * o_i,

      where \delta_j = - \sigma_

      {m} * o_j^{(m)} * (1 - o_j^{(m)}) * (t_j^{(m)} - o_j^{(m)}) for output layer, \delta = - \sigma_{m}

      * o_j^

      {(m)} * (1 - o_j^{(m)}

      ) * \sigma_k \delta_k * w_

      {jk}

      for hidden layer.

      It is easy to know that \delta_j can be rewritten as

      \delta_j = - \sigma_

      {i = 1}

      ^k \sigma_

      {m_i}

      * o_j^

      {(m_i)} * (1 - o_j^{(m_i)}

      ) * (t_j^

      {(m_i)} - o_j^{(m_i)}

      )

      The above equation indicates that the \delta_j can be divided into k parts.

      So for the implementation, each mapper can calculate part of \delta_j with given partition of data, and then store the result into a specified location.

      3.2 The model update step:

      After k parts of \delta_j been calculated, a separate program can be used to merge the k parts of \delta_j into one to update the weight matrices.

      This program can load the results calculated in the weight update calculation step and update the weight matrices.

      1. Mahout-1265-17.patch
        52 kB
        Yexi Jiang
      2. MAHOUT-1265.patch
        52 kB
        Suneel Marthi

        Issue Links

          Activity

          Hide
          Yexi Jiang added a comment -

          Hi, Mark Yakushev, according to MAHOUT-1510, mahout no longer accept the proposal of MR algorithm.

          Show
          Yexi Jiang added a comment - Hi, Mark Yakushev , according to MAHOUT-1510 , mahout no longer accept the proposal of MR algorithm.
          Hide
          Mark Yakushev added a comment -

          Hi Yexi and Ted,

          Is there anything new on the mapreduce implementation?

          Show
          Mark Yakushev added a comment - Hi Yexi and Ted, Is there anything new on the mapreduce implementation?
          Hide
          Ted Dunning added a comment -

          Great. I am thinking a mapreduce version of MLP. It may take a non-trivial amount of time.

          Let's talk on the mailing list. I really think that a downpour architecture will not be much harder than a map-reduce implementation and will be orders of magnitude faster.

          Show
          Ted Dunning added a comment - Great. I am thinking a mapreduce version of MLP. It may take a non-trivial amount of time. Let's talk on the mailing list. I really think that a downpour architecture will not be much harder than a map-reduce implementation and will be orders of magnitude faster.
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in Mahout-Quality #2376 (See https://builds.apache.org/job/Mahout-Quality/2376/)
          MAHOUT-1265: Multilayer Perceptron (smarthi: rev 1552403)

          • /mahout/trunk/CHANGELOG
          • /mahout/trunk/core/src/main/java/org/apache/mahout/classifier/mlp
          • /mahout/trunk/core/src/main/java/org/apache/mahout/classifier/mlp/MultilayerPerceptron.java
          • /mahout/trunk/core/src/main/java/org/apache/mahout/classifier/mlp/NeuralNetwork.java
          • /mahout/trunk/core/src/main/java/org/apache/mahout/classifier/mlp/NeuralNetworkFunctions.java
          • /mahout/trunk/core/src/test/java/org/apache/mahout/classifier/mlp
          • /mahout/trunk/core/src/test/java/org/apache/mahout/classifier/mlp/TestMultilayerPerceptron.java
          • /mahout/trunk/core/src/test/java/org/apache/mahout/classifier/mlp/TestNeuralNetwork.java
          Show
          Hudson added a comment - SUCCESS: Integrated in Mahout-Quality #2376 (See https://builds.apache.org/job/Mahout-Quality/2376/ ) MAHOUT-1265 : Multilayer Perceptron (smarthi: rev 1552403) /mahout/trunk/CHANGELOG /mahout/trunk/core/src/main/java/org/apache/mahout/classifier/mlp /mahout/trunk/core/src/main/java/org/apache/mahout/classifier/mlp/MultilayerPerceptron.java /mahout/trunk/core/src/main/java/org/apache/mahout/classifier/mlp/NeuralNetwork.java /mahout/trunk/core/src/main/java/org/apache/mahout/classifier/mlp/NeuralNetworkFunctions.java /mahout/trunk/core/src/test/java/org/apache/mahout/classifier/mlp /mahout/trunk/core/src/test/java/org/apache/mahout/classifier/mlp/TestMultilayerPerceptron.java /mahout/trunk/core/src/test/java/org/apache/mahout/classifier/mlp/TestNeuralNetwork.java
          Hide
          Yexi Jiang added a comment -

          Great. I am thinking a mapreduce version of MLP. It may take a non-trivial amount of time.

          Show
          Yexi Jiang added a comment - Great. I am thinking a mapreduce version of MLP. It may take a non-trivial amount of time.
          Hide
          Suneel Marthi added a comment -

          Patch committed to trunk, great work Yexi.

          Show
          Suneel Marthi added a comment - Patch committed to trunk, great work Yexi.
          Hide
          Suneel Marthi added a comment -

          I'll be committing this code to trunk today.

          Show
          Suneel Marthi added a comment - I'll be committing this code to trunk today.
          Hide
          Yexi Jiang added a comment -

          The version 17.

          Show
          Yexi Jiang added a comment - The version 17.
          Hide
          Suneel Marthi added a comment -

          Updated patch, fixed styling issues.

          Show
          Suneel Marthi added a comment - Updated patch, fixed styling issues.
          Hide
          Yexi Jiang added a comment -

          I have applied the patch to my local code base and tested it. It works without any error.

          Show
          Yexi Jiang added a comment - I have applied the patch to my local code base and tested it. It works without any error.
          Hide
          Suneel Marthi added a comment -

          Yexi, updated ur latest patch (rev 13). See attached. I formatted the code for style and tweaked some of the code. Try running this patch in ur environment and verify that its good.

          Show
          Suneel Marthi added a comment - Yexi, updated ur latest patch (rev 13). See attached. I formatted the code for style and tweaked some of the code. Try running this patch in ur environment and verify that its good.
          Hide
          Yexi Jiang added a comment -

          Updated according to latest feedback.

          Show
          Yexi Jiang added a comment - Updated according to latest feedback.
          Hide
          Yexi Jiang added a comment -

          This is the final version of the patch. It has been reviewed by Suneel Marthi.

          Show
          Yexi Jiang added a comment - This is the final version of the patch. It has been reviewed by Suneel Marthi .
          Hide
          Yexi Jiang added a comment -

          This is the final version of the patch. It has been reviewed by Suneel Marthi.

          Show
          Yexi Jiang added a comment - This is the final version of the patch. It has been reviewed by Suneel Marthi .
          Hide
          Yexi Jiang added a comment -

          OK, I'll revise it accordingly.

          Show
          Yexi Jiang added a comment - OK, I'll revise it accordingly.
          Hide
          Suneel Marthi added a comment -

          Yexi Jiang Please look at my comments on Reviewboard.

          Show
          Suneel Marthi added a comment - Yexi Jiang Please look at my comments on Reviewboard.
          Hide
          Suneel Marthi added a comment -

          Yexi Jiang I'll have time next week to review this.

          Show
          Suneel Marthi added a comment - Yexi Jiang I'll have time next week to review this.
          Hide
          Yexi Jiang added a comment -

          There is no news?

          Show
          Yexi Jiang added a comment - There is no news?
          Hide
          Yexi Jiang added a comment -

          Suneel Marthi Done, please refer to https://reviews.apache.org/r/13406/. Thank you.

          Show
          Yexi Jiang added a comment - Suneel Marthi Done, please refer to https://reviews.apache.org/r/13406/ . Thank you.
          Hide
          Suneel Marthi added a comment -

          Yexi Jiang Could you upload this to ReviewBoard? Its easier to review and comment on the code that way.

          https://reviews.apache.org

          Show
          Suneel Marthi added a comment - Yexi Jiang Could you upload this to ReviewBoard? Its easier to review and comment on the code that way. https://reviews.apache.org
          Hide
          Yexi Jiang added a comment -

          Is there any who can review the code?
          The sample code for using it can be seen in test cases.

          Ted Dunning Could you please give any comments?

          Show
          Yexi Jiang added a comment - Is there any who can review the code? The sample code for using it can be seen in test cases. Ted Dunning Could you please give any comments?
          Hide
          Yexi Jiang added a comment -

          The MLP is implemented based on NeuralNetwork. The NeuralNetwork is more general in terms of functionality (can be used for regression, classification, dimensional reduction, etc) and architecture (Linear Regression and Logistic Regression as a two-level neural network, Autoencoder as a three-level neural network, I heard that even the SVM can be modeled as a type of neural network, but I'm not sure.).

          In my opinion, the NeuralNetwork I implemented is a suitable start for deep learning, as one implementation of the Deep nets is based on stacking the Autoencoder.

          Show
          Yexi Jiang added a comment - The MLP is implemented based on NeuralNetwork. The NeuralNetwork is more general in terms of functionality (can be used for regression, classification, dimensional reduction, etc) and architecture (Linear Regression and Logistic Regression as a two-level neural network, Autoencoder as a three-level neural network, I heard that even the SVM can be modeled as a type of neural network, but I'm not sure.). In my opinion, the NeuralNetwork I implemented is a suitable start for deep learning, as one implementation of the Deep nets is based on stacking the Autoencoder.
          Hide
          Ted Dunning added a comment -

          We have several efforts that are going to be helped by a parameter server implementation. Deep learning is one. Other non-linear optimizations are likely to as well.

          Is this MLP issue a good place to start with that?

          Show
          Ted Dunning added a comment - We have several efforts that are going to be helped by a parameter server implementation. Deep learning is one. Other non-linear optimizations are likely to as well. Is this MLP issue a good place to start with that?
          Hide
          Yexi Jiang added a comment -

          Ted Dunning The test cases contain the test on three datasets, the simple XOR problem, the Cancer dataset (2-class classification) and the Iris dataset(3-class classification). For the later two datasets, the classification accuracy is more than 90%.

          Show
          Yexi Jiang added a comment - Ted Dunning The test cases contain the test on three datasets, the simple XOR problem, the Cancer dataset (2-class classification) and the Iris dataset(3-class classification). For the later two datasets, the classification accuracy is more than 90%.
          Hide
          Yexi Jiang added a comment -

          Ted Dunning I have finished a workable single machine version MultilayerPerceptron (based on NeuralNetwork). It supports the requirement as you mentioned above. It allow users to customize each layer including the size and the squashing function. Also, it allows users to specify different loss functions to the model. Moreover, it allow user to store the trained model and reload it for later use. Finally, it allows users to extract the weight of each layer from a trained model. This approach allows users to train and stack a deep learning neural network layer by layer. If this single machine version passes the review, I will begin to work on the map-reduce version base on it.

          Show
          Yexi Jiang added a comment - Ted Dunning I have finished a workable single machine version MultilayerPerceptron (based on NeuralNetwork). It supports the requirement as you mentioned above. It allow users to customize each layer including the size and the squashing function. Also, it allows users to specify different loss functions to the model. Moreover, it allow user to store the trained model and reload it for later use. Finally, it allows users to extract the weight of each layer from a trained model. This approach allows users to train and stack a deep learning neural network layer by layer. If this single machine version passes the review, I will begin to work on the map-reduce version base on it.
          Hide
          Yexi Jiang added a comment -

          Ted,

          I would suggest that a more fluid API would be helpful to people. For instance,
          each layer might be an object which could be composed together to build a model which
          is then trained.

          It seems that you suggest a more general neural network, not just the MLP.
          A MLP is a kind of feed-forward neural network that the topology is fixed.
          It usually consists of several layers and every pair of neurons in adjacent layers are connected.
          Therefore, specify the size of each layer is enough to determine the topology of a MLP.

          It is good if we first define a generic neural network, and then build a MLP on top of this generic neural network in the way as you said. An advantage is that the generic neural network can be reused to build other types of neural networks in the future, e.g. autoencoder for dimensional reduction, recurrent neural network for sequential mining, or possibly deep nets, etc.

          Secondly, it seems like it would be good to have different kinds of loss function and
          regularizations.

          Yes, the MLP would allow the user to specify different loss function, squashing functions, and regularizations.

          Also, regarding things like momentum, do you have an idea that this really needs to be
          commonly adjusted? or is there a way to set a good default?

          As far as I know, there is no empirical way to set a good default momentum weight. A good value is determined by the concrete problem. As for learning rate, a good way is to enable the decaying learning rate.

          Show
          Yexi Jiang added a comment - Ted, I would suggest that a more fluid API would be helpful to people. For instance, each layer might be an object which could be composed together to build a model which is then trained. It seems that you suggest a more general neural network, not just the MLP. A MLP is a kind of feed-forward neural network that the topology is fixed. It usually consists of several layers and every pair of neurons in adjacent layers are connected. Therefore, specify the size of each layer is enough to determine the topology of a MLP. It is good if we first define a generic neural network, and then build a MLP on top of this generic neural network in the way as you said. An advantage is that the generic neural network can be reused to build other types of neural networks in the future, e.g. autoencoder for dimensional reduction, recurrent neural network for sequential mining, or possibly deep nets, etc. Secondly, it seems like it would be good to have different kinds of loss function and regularizations. Yes, the MLP would allow the user to specify different loss function, squashing functions, and regularizations. Also, regarding things like momentum, do you have an idea that this really needs to be commonly adjusted? or is there a way to set a good default? As far as I know, there is no empirical way to set a good default momentum weight. A good value is determined by the concrete problem. As for learning rate, a good way is to enable the decaying learning rate.
          Hide
          Ted Dunning added a comment -

          Yexi,

          I would suggest that a more fluid API would be helpful to people. For instance,
          each layer might be an object which could be composed together to build a model which
          is then trained.

          Secondly, it seems like it would be good to have different kinds of loss function and
          regularizations.

          Also, regarding things like momentum, do you have an idea that this really needs to be
          commonly adjusted? or is there a way to set a good default?

          Show
          Ted Dunning added a comment - Yexi, I would suggest that a more fluid API would be helpful to people. For instance, each layer might be an object which could be composed together to build a model which is then trained. Secondly, it seems like it would be good to have different kinds of loss function and regularizations. Also, regarding things like momentum, do you have an idea that this really needs to be commonly adjusted? or is there a way to set a good default?

            People

            • Assignee:
              Suneel Marthi
              Reporter:
              Yexi Jiang
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development