Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 0.6
    • Fix Version/s: 0.6
    • Component/s: Classification
    • Labels:

      Description

      Implement a gradient machine (aka 'neural network) that can be used for classification or auto-encoding.
      It will just have an input layer, identity, sigmoid or tanh hidden layer and an output layer.
      Training done by stochastic gradient descent (possibly mini-batch later).
      Sparsity will be optionally enforced by tweaking the bias in the hidden unit.
      For now it will go in classifier/sgd and the auto-encoder will wrap it in the filter unit later on.

        Activity

        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Patch Available Patch Available
        2d 6h 28m 1 Hector Yee 21/May/11 09:51
        Patch Available Patch Available Resolved Resolved
        14d 10h 55m 1 Sean Owen 04/Jun/11 20:47
        Resolved Resolved Closed Closed
        249d 18h 13m 1 Sean Owen 09/Feb/12 14:01
        Sean Owen made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Hide
        Hudson added a comment -

        Integrated in Mahout-Quality #861 (See https://builds.apache.org/hudson/job/Mahout-Quality/861/)
        MAHOUT-703 implement Gradient machine classifier

        srowen : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1131481
        Files :

        • /mahout/trunk/core/src/main/java/org/apache/mahout/classifier/sgd/GradientMachine.java
        • /mahout/trunk/core/src/test/java/org/apache/mahout/classifier/sgd/GradientMachineTest.java
        • /mahout/trunk/math/src/main/java/org/apache/mahout/math/function/Functions.java
        Show
        Hudson added a comment - Integrated in Mahout-Quality #861 (See https://builds.apache.org/hudson/job/Mahout-Quality/861/ ) MAHOUT-703 implement Gradient machine classifier srowen : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1131481 Files : /mahout/trunk/core/src/main/java/org/apache/mahout/classifier/sgd/GradientMachine.java /mahout/trunk/core/src/test/java/org/apache/mahout/classifier/sgd/GradientMachineTest.java /mahout/trunk/math/src/main/java/org/apache/mahout/math/function/Functions.java
        Hide
        Hector Yee added a comment - - edited

        Thanks! I'll fix this and submit a new patch. (Edit: whoops looks like it was committed already scratch that. Thanks for fixing the style).

        Show
        Hector Yee added a comment - - edited Thanks! I'll fix this and submit a new patch. (Edit: whoops looks like it was committed already scratch that. Thanks for fixing the style).
        Sean Owen made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Assignee Ted Dunning [ tdunning ]
        Resolution Fixed [ 1 ]
        Hide
        Sean Owen added a comment -

        Another good one Hector and hearing no grunts of objection from Ted let's put it in. I have a few small style points for your patches.

        • We'll need to use the standard Apache license header
        • Class description can/should go in the class javadoc not above the package statement
        • Java var naming syntax is camelCase rather than camel_case
        • Careful of the javadoc – it has to start with /** to be read as such
        • Go ahead and use braces and a newline with every control flow statement including ifs
        • In train(), outputActivation is not used?
        Show
        Sean Owen added a comment - Another good one Hector and hearing no grunts of objection from Ted let's put it in. I have a few small style points for your patches. We'll need to use the standard Apache license header Class description can/should go in the class javadoc not above the package statement Java var naming syntax is camelCase rather than camel_case Careful of the javadoc – it has to start with /** to be read as such Go ahead and use braces and a newline with every control flow statement including ifs In train(), outputActivation is not used?
        Hide
        Ted Dunning added a comment -

        You can make the auto-encoder depend on this bug and just go forward. It might even help to drop things on github
        so that you can keep rebasing two branches.

        Show
        Ted Dunning added a comment - You can make the auto-encoder depend on this bug and just go forward. It might even help to drop things on github so that you can keep rebasing two branches.
        Hide
        Hector Yee added a comment -

        Any news on this patch? I need it to implement an autoencoder.

        Show
        Hector Yee added a comment - Any news on this patch? I need it to implement an autoencoder.
        Hide
        Hector Yee added a comment -

        Note: This patch requires 702 for the OnlineBaseTest.

        Show
        Hector Yee added a comment - Note: This patch requires 702 for the OnlineBaseTest.
        Hector Yee made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Hide
        Hector Yee added a comment -

        Working ranking neural net with one hidden sigmoid layer.

        Show
        Hector Yee added a comment - Working ranking neural net with one hidden sigmoid layer.
        Hector Yee made changes -
        Attachment MAHOUT-703.patch [ 12479991 ]
        Hide
        Hector Yee added a comment -

        Working ranking neural net, less the sparsity enforcing part. Would appreciate if someone could check the math. Unit tests pass.

        Show
        Hector Yee added a comment - Working ranking neural net, less the sparsity enforcing part. Would appreciate if someone could check the math. Unit tests pass.
        Ted Dunning made changes -
        Field Original Value New Value
        Fix Version/s 0.6 [ 12316364 ]
        Hide
        Hector Yee added a comment -
        Show
        Hector Yee added a comment - Sure its here: http://www.stanford.edu/class/cs294a/sparseAutoencoder_2011new.pdf Bias as in bias unit.
        Hide
        Ted Dunning added a comment -

        Do you have a reference for this bias tweaking trick?

        Is it bias as in the bias unit?

        Or bias as in bias-variance?

        Show
        Ted Dunning added a comment - Do you have a reference for this bias tweaking trick? Is it bias as in the bias unit? Or bias as in bias-variance?
        Hide
        Hector Yee added a comment -

        Yeah was planning to do L2 regularization first. L1 can be tricky due to edge cases like crossing / following the simplex, so I'll enforce sparsity with Andrew Ng's bias tweaking trick first.

        Show
        Hector Yee added a comment - Yeah was planning to do L2 regularization first. L1 can be tricky due to edge cases like crossing / following the simplex, so I'll enforce sparsity with Andrew Ng's bias tweaking trick first.
        Hide
        Ted Dunning added a comment -

        It comes almost for free with SGD neural net codes to put L1 and L2 penalties in as well. I would recommend it.

        The trick is that you can't depend on the gradient being sparse so you can't use the lazy regularization. Leon Botou describes
        a stochastic full regularization with an adjusted learning rate which should perform comparably. He mostly talks about weight decay (which is L_2 regularization) which can be handled cleverly by keeping a multiplier and a vector. I think L_1 is important, but it requires something like truncated constant decay which can't be done with a multiplier.

        See http://leon.bottou.org/projects/sgd

        Show
        Ted Dunning added a comment - It comes almost for free with SGD neural net codes to put L1 and L2 penalties in as well. I would recommend it. The trick is that you can't depend on the gradient being sparse so you can't use the lazy regularization. Leon Botou describes a stochastic full regularization with an adjusted learning rate which should perform comparably. He mostly talks about weight decay (which is L_2 regularization) which can be handled cleverly by keeping a multiplier and a vector. I think L_1 is important, but it requires something like truncated constant decay which can't be done with a multiplier. See http://leon.bottou.org/projects/sgd
        Hector Yee created issue -

          People

          • Assignee:
            Ted Dunning
            Reporter:
            Hector Yee
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 72h
              72h
              Remaining:
              Remaining Estimate - 72h
              72h
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development