Uploaded image for project: 'Mahout'
  1. Mahout
  2. MAHOUT-968

Classifier based on restricted boltzmann machines

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Won't Fix
    • 0.7
    • 0.10.0
    • None

    Description

      This is a proposal for a new classifier based on restricted boltzmann machines. The development of this feature follows the paper on "Deep Boltzmann Machines" (DBM) [1] from 2009. The proposed model (DBM) got an error rate of 0.95% on the mnist dataset [2], which is really good. Main parts of the implementation should also be applicable to other scenarios than classification where restricted boltzmann machines are used (ref. MAHOUT-375).
      I am working on this feature right now, and the results are promising. The only problem with the training algorithm is, that it is still mostly sequential (if training batches are small, what they should be), which makes Map/Reduce until now, not really beneficial. However, since the algorithm itself is fast (for a training algorithm), training can be done on a single machine in managable time.
      Testing of the algorithm is currently done on the mnist dataset itself to reproduce results of [1]. As soon as results indicate, that everything is working fine, I will upload the patch.

      [1] http://www.cs.toronto.edu/~hinton/absps/dbm.pdf
      [2] http://yann.lecun.com/exdb/mnist/

      Attachments

        1. MAHOUT-968.patch
          131 kB
          Dirk Weißenborn
        2. MAHOUT-968.patch
          137 kB
          Dirk Weißenborn

        Activity

          People

            robinanil Robin Anil
            dirk.weissenborn Dirk Weißenborn
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - 336h
                336h
                Remaining:
                Remaining Estimate - 336h
                336h
                Logged:
                Time Spent - Not Specified
                Not Specified