Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 0.5
    • Fix Version/s: 0.6
    • Component/s: Classification
    • Labels:

      Description

      Mahout already have HMM functionality, but it presents only in API.
      Command-line tools should be added and registered in driver.classes.props

      These patches are get from git against trunk of mahout's github
      [this is my "traning" issue in Jira to learn how to commit patches to the Mahout, so please be merficul]

      1. hmm-utils.patch
        38 kB
        Sergey Bartunov

        Activity

        Hide
        Hudson added a comment -

        Integrated in Mahout-Quality #911 (See https://builds.apache.org/job/Mahout-Quality/911/)
        MAHOUT-734 add HMM command lines

        srowen : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1140264
        Files :

        • /mahout/trunk/src/conf/driver.classes.props
        • /mahout/trunk/core/src/main/java/org/apache/mahout/classifier/sequencelearning/hmm/LossyHmmSerializer.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/classifier/sequencelearning/hmm/HmmModel.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/classifier/sequencelearning/hmm/RandomSequenceGenerator.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/classifier/sequencelearning/hmm/HmmTrainer.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/classifier/sequencelearning/hmm/ViterbiEvaluator.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/classifier/sequencelearning/hmm/BaumWelchTrainer.java
        Show
        Hudson added a comment - Integrated in Mahout-Quality #911 (See https://builds.apache.org/job/Mahout-Quality/911/ ) MAHOUT-734 add HMM command lines srowen : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1140264 Files : /mahout/trunk/src/conf/driver.classes.props /mahout/trunk/core/src/main/java/org/apache/mahout/classifier/sequencelearning/hmm/LossyHmmSerializer.java /mahout/trunk/core/src/main/java/org/apache/mahout/classifier/sequencelearning/hmm/HmmModel.java /mahout/trunk/core/src/main/java/org/apache/mahout/classifier/sequencelearning/hmm/RandomSequenceGenerator.java /mahout/trunk/core/src/main/java/org/apache/mahout/classifier/sequencelearning/hmm/HmmTrainer.java /mahout/trunk/core/src/main/java/org/apache/mahout/classifier/sequencelearning/hmm/ViterbiEvaluator.java /mahout/trunk/core/src/main/java/org/apache/mahout/classifier/sequencelearning/hmm/BaumWelchTrainer.java
        Hide
        Sergey Bartunov added a comment - - edited

        Well, here's what I have:

        sbos@pride-linux:~/gsoc/tmp/svn/trunk$ svn info
        Path: .
        URL: http://svn.apache.org/repos/asf/mahout/trunk
        Repository Root: http://svn.apache.org/repos/asf
        Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68
        Revision: 1140245
        Node Kind: directory
        Schedule: normal
        Last Changed Author: srowen
        Last Changed Rev: 1140224
        Last Changed Date: 2011-06-27 20:09:10 +0400 (Mon, 27 Jun 2011)

        Show
        Sergey Bartunov added a comment - - edited Well, here's what I have: sbos@pride-linux:~/gsoc/tmp/svn/trunk$ svn info Path: . URL: http://svn.apache.org/repos/asf/mahout/trunk Repository Root: http://svn.apache.org/repos/asf Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68 Revision: 1140245 Node Kind: directory Schedule: normal Last Changed Author: srowen Last Changed Rev: 1140224 Last Changed Date: 2011-06-27 20:09:10 +0400 (Mon, 27 Jun 2011)
        Hide
        Sean Owen added a comment -

        OK sounds fine.

        (Oh right that reminds me – I also saw the patch un-did some recent changes to the files. Was it against HEAD? I just reverted them as it looked unintentional.)

        Show
        Sean Owen added a comment - OK sounds fine. (Oh right that reminds me – I also saw the patch un-did some recent changes to the files. Was it against HEAD? I just reverted them as it looked unintentional.)
        Hide
        Sergey Bartunov added a comment -

        Actually, I wanted to implement HmmModel as Writable, but it contains several BiMaps, i.e. for names of hidden states which I didn't want to serialize/deserialize, so I wrote "lossy" serializer to be honest since I don't serialize all information from the model.

        Show
        Sergey Bartunov added a comment - Actually, I wanted to implement HmmModel as Writable, but it contains several BiMaps, i.e. for names of hidden states which I didn't want to serialize/deserialize, so I wrote "lossy" serializer to be honest since I don't serialize all information from the model.
        Hide
        Sean Owen added a comment -

        OK, I'll commit with some style changes.

        We don't put final on locals – just clutters the code too much.
        There were a number of whitespace-only changes here; I'd omit those.
        You want to close() streams in a finally block.
        Tiny stuff – we don't use * imports, and "new Date().getTime()" is best as "System.currentTimeMillis()".

        As a "bonus" I saw a pre-existing problem in HmmModel.clone() and just fixed it along the way. (Calling super.clone() and ignoring the value doesn't make sense.)

        I wonder if your "lossy" serializer should just be the default Writable implementation for HmmModel?

        Show
        Sean Owen added a comment - OK, I'll commit with some style changes. We don't put final on locals – just clutters the code too much. There were a number of whitespace-only changes here; I'd omit those. You want to close() streams in a finally block. Tiny stuff – we don't use * imports, and "new Date().getTime()" is best as "System.currentTimeMillis()". As a "bonus" I saw a pre-existing problem in HmmModel.clone() and just fixed it along the way. (Calling super.clone() and ignoring the value doesn't make sense.) I wonder if your "lossy" serializer should just be the default Writable implementation for HmmModel?
        Hide
        Sergey Bartunov added a comment -

        All changes in one SVN patch

        Show
        Sergey Bartunov added a comment - All changes in one SVN patch
        Hide
        Sean Owen added a comment -

        Can you provide one patch with all changes? the last patch is not complete.

        Show
        Sean Owen added a comment - Can you provide one patch with all changes? the last patch is not complete.
        Hide
        Sergey Bartunov added a comment -

        BaumWelchTrainer now prints trained model to the screen

        Show
        Sergey Bartunov added a comment - BaumWelchTrainer now prints trained model to the screen
        Hide
        Sergey Bartunov added a comment -

        Could someone explain me, why I also had two big files (0001-maven-release-plugin-copy-for-tag-mahout-0.5.patch and 0002-maven-release-plugin-copy-for-tag-mahout-0.5.patch about 10 mb) after git format-patch?

        I just cloned git repository, created new branch from the trunk, made my modifications and performed git format-patch.

        git log shows me that last 3 commits were mine:
        bce3ebc6e8f8d575f1fb0e05e6c69e5c9d374c6e Command-line tool for generated random
        0d5ef688fbc272fa2fc23d7fee4e03766c168b89 Command line tool for Viterbi evaluatio
        7be42824a0767d4208b9dcd7da49beee06ff15ee command-line util for baum-welch algori
        3d62dfbeb33777cfb77134b52c7f2b32382eb0dd better tmp path handling in Distributed

        and there are no things like "0002-maven-release-plugin-copy-for-tag-mahout-0.5"

        Show
        Sergey Bartunov added a comment - Could someone explain me, why I also had two big files (0001-maven-release-plugin-copy-for-tag-mahout-0.5.patch and 0002-maven-release-plugin-copy-for-tag-mahout-0.5.patch about 10 mb) after git format-patch? I just cloned git repository, created new branch from the trunk, made my modifications and performed git format-patch. git log shows me that last 3 commits were mine: bce3ebc6e8f8d575f1fb0e05e6c69e5c9d374c6e Command-line tool for generated random 0d5ef688fbc272fa2fc23d7fee4e03766c168b89 Command line tool for Viterbi evaluatio 7be42824a0767d4208b9dcd7da49beee06ff15ee command-line util for baum-welch algori 3d62dfbeb33777cfb77134b52c7f2b32382eb0dd better tmp path handling in Distributed and there are no things like "0002-maven-release-plugin-copy-for-tag-mahout-0.5"
        Hide
        Sergey Bartunov added a comment -

        There are three patches with self-describing names

        Show
        Sergey Bartunov added a comment - There are three patches with self-describing names

          People

          • Assignee:
            Sean Owen
            Reporter:
            Sergey Bartunov
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development