Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 0.5
    • Fix Version/s: 0.6
    • Component/s: Classification
    • Labels:

      Description

      Mahout already have HMM functionality, but it presents only in API.
      Command-line tools should be added and registered in driver.classes.props

      These patches are get from git against trunk of mahout's github
      [this is my "traning" issue in Jira to learn how to commit patches to the Mahout, so please be merficul]

      1. hmm-utils.patch
        38 kB
        Sergey Bartunov

        Activity

        Sergey Bartunov created issue -
        Sergey Bartunov made changes -
        Field Original Value New Value
        Status Open [ 1 ] Patch Available [ 10002 ]
        Sergey Bartunov made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Sergey Bartunov made changes -
        Attachment 0003-command-line-util-for-baum-welch-algorithm-on-HMM.patch [ 12482611 ]
        Attachment 0004-Command-line-tool-for-Viterbi-evaluation.patch [ 12482612 ]
        Attachment 0005-Command-line-tool-for-generated-random-observations-.patch [ 12482613 ]
        Sergey Bartunov made changes -
        Comment [ From 7be42824a0767d4208b9dcd7da49beee06ff15ee Mon Sep 17 00:00:00 2001
        From: Sergey Bartunov <sbos.net@gmail.com>
        Date: Wed, 15 Jun 2011 01:04:39 +0400
        Subject: [PATCH 3/5] command-line util for baum-welch algorithm on HMM

        ---
         .../sequencelearning/hmm/BaumWelchTrainer.java | 127 ++++++++++++++++++++
         .../sequencelearning/hmm/LossyHmmSerializer.java | 57 +++++++++
         src/conf/driver.classes.props | 3 +-
         3 files changed, 186 insertions(+), 1 deletions(-)
         create mode 100644 core/src/main/java/org/apache/mahout/classifier/sequencelearning/hmm/BaumWelchTrainer.java
         create mode 100644 core/src/main/java/org/apache/mahout/classifier/sequencelearning/hmm/LossyHmmSerializer.java

        diff --git a/core/src/main/java/org/apache/mahout/classifier/sequencelearning/hmm/BaumWelchTrainer.java b/core/src/main/java/org/apache/mahout/classifier/sequencelearning/hmm/BaumWelchTrainer.java
        new file mode 100644
        index 0000000..410fcad
        --- /dev/null
        +++ b/core/src/main/java/org/apache/mahout/classifier/sequencelearning/hmm/BaumWelchTrainer.java
        @@ -0,0 +1,127 @@
        +/**
        + * Licensed to the Apache Software Foundation (ASF) under one or more
        + * contributor license agreements. See the NOTICE file distributed with
        + * this work for additional information regarding copyright ownership.
        + * The ASF licenses this file to You under the Apache License, Version 2.0
        + * (the "License"); you may not use this file except in compliance with
        + * the License. You may obtain a copy of the License at
        + *
        + * http://www.apache.org/licenses/LICENSE-2.0
        + *
        + * Unless required by applicable law or agreed to in writing, software
        + * distributed under the License is distributed on an "AS IS" BASIS,
        + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
        + * See the License for the specific language governing permissions and
        + * limitations under the License.
        + */
        +
        +package org.apache.mahout.classifier.sequencelearning.hmm;
        +
        +import org.apache.commons.cli2.CommandLine;
        +import org.apache.commons.cli2.Group;
        +import org.apache.commons.cli2.Option;
        +import org.apache.commons.cli2.OptionException;
        +import org.apache.commons.cli2.builder.ArgumentBuilder;
        +import org.apache.commons.cli2.builder.DefaultOptionBuilder;
        +import org.apache.commons.cli2.builder.GroupBuilder;
        +import org.apache.commons.cli2.commandline.Parser;
        +import org.apache.mahout.common.CommandLineUtil;
        +
        +import java.io.DataOutputStream;
        +import java.io.FileInputStream;
        +import java.io.FileOutputStream;
        +import java.io.IOException;
        +import java.util.ArrayList;
        +import java.util.Date;
        +import java.util.List;
        +import java.util.Scanner;
        +
        +/**
        + * A class for EM training of HMM from console
        + */
        +public class BaumWelchTrainer {
        + public static void main(String[] args) throws IOException {
        + final DefaultOptionBuilder optionBuilder = new DefaultOptionBuilder();
        + final ArgumentBuilder argumentBuilder = new ArgumentBuilder();
        +
        + final Option inputOption = optionBuilder.withLongName("input").
        + withDescription("Text file with space-separated integers to train on").
        + withShortName("i").withArgument(argumentBuilder.withMaximum(1).withMinimum(1).
        + withName("path").create()).withRequired(true).create();
        +
        + final Option outputOption = optionBuilder.withLongName("output").
        + withDescription("Path trained HMM model should be serialized to").
        + withShortName("o").withArgument(argumentBuilder.withMaximum(1).withMinimum(1).
        + withName("path").create()).withRequired(true).create();
        +
        + final Option stateNumberOption = optionBuilder.withLongName("nrOfHiddenStates").
        + withDescription("Number of hidden states").
        + withShortName("nh").withArgument(argumentBuilder.withMaximum(1).withMinimum(1).
        + withName("number").create()).withRequired(true).create();
        +
        + final Option observedStateNumberOption = optionBuilder.withLongName("nrOfObservedStates").
        + withDescription("Number of observed states").
        + withShortName("no").withArgument(argumentBuilder.withMaximum(1).withMinimum(1).
        + withName("number").create()).withRequired(true).create();
        +
        + final Option epsilonOption = optionBuilder.withLongName("epsilon").
        + withDescription("Convergence threshold").
        + withShortName("e").withArgument(argumentBuilder.withMaximum(1).withMinimum(1).
        + withName("number").create()).withRequired(true).create();
        +
        + final Option iterationsOption = optionBuilder.withLongName("max-iterations").
        + withDescription("Maximum iterations number").
        + withShortName("m").withArgument(argumentBuilder.withMaximum(1).withMinimum(1).
        + withName("number").create()).withRequired(true).create();
        +
        + final Group optionGroup = new GroupBuilder().withOption(inputOption).
        + withOption(outputOption).withOption(stateNumberOption).withOption(observedStateNumberOption).
        + withOption(epsilonOption).withOption(iterationsOption).
        + withName("Options").create();
        +
        + try {
        + final Parser parser = new Parser();
        + parser.setGroup(optionGroup);
        + final CommandLine commandLine = parser.parse(args);
        +
        + final String input = (String) commandLine.getValue(inputOption);
        + final String output = (String) commandLine.getValue(outputOption);
        +
        + final int nrOfHiddenStates = Integer.parseInt((String) commandLine.getValue(stateNumberOption));
        + final int nrOfObservedStates = Integer.parseInt((String) commandLine.getValue(observedStateNumberOption));
        +
        + final double epsilon = Double.parseDouble((String) commandLine.getValue(epsilonOption));
        + final int maxIterations = Integer.parseInt((String) commandLine.getValue(iterationsOption));
        +
        + //constructing random-generated HMM
        + final HmmModel model = new HmmModel(nrOfHiddenStates, nrOfObservedStates, new Date().getTime());
        + final List<Integer> observations = new ArrayList<Integer>();
        +
        + //reading observations
        + final FileInputStream inputStream = new FileInputStream(input);
        + final Scanner scanner = new Scanner(inputStream);
        +
        + while (scanner.hasNextInt()) {
        + observations.add(scanner.nextInt());
        + }
        +
        + scanner.close();
        + inputStream.close();
        +
        + final int[] observationsArray = new int[observations.size()];
        + for (int i = 0; i < observations.size(); ++i)
        + observationsArray[i] = observations.get(i);
        +
        + //training
        + final HmmModel trainedModel = HmmTrainer.trainBaumWelch(model,
        + observationsArray, epsilon, maxIterations, true);
        +
        + //serializing trained model
        + final DataOutputStream stream = new DataOutputStream(new FileOutputStream(output));
        + LossyHmmSerializer.serialize(trainedModel, stream);
        + stream.close();
        + } catch (OptionException e) {
        + CommandLineUtil.printHelp(optionGroup);
        + }
        + }
        +}
        diff --git a/core/src/main/java/org/apache/mahout/classifier/sequencelearning/hmm/LossyHmmSerializer.java b/core/src/main/java/org/apache/mahout/classifier/sequencelearning/hmm/LossyHmmSerializer.java
        new file mode 100644
        index 0000000..8bbb814
        --- /dev/null
        +++ b/core/src/main/java/org/apache/mahout/classifier/sequencelearning/hmm/LossyHmmSerializer.java
        @@ -0,0 +1,57 @@
        +/**
        + * Licensed to the Apache Software Foundation (ASF) under one or more
        + * contributor license agreements. See the NOTICE file distributed with
        + * this work for additional information regarding copyright ownership.
        + * The ASF licenses this file to You under the Apache License, Version 2.0
        + * (the "License"); you may not use this file except in compliance with
        + * the License. You may obtain a copy of the License at
        + *
        + * http://www.apache.org/licenses/LICENSE-2.0
        + *
        + * Unless required by applicable law or agreed to in writing, software
        + * distributed under the License is distributed on an "AS IS" BASIS,
        + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
        + * See the License for the specific language governing permissions and
        + * limitations under the License.
        + */
        +
        +package org.apache.mahout.classifier.sequencelearning.hmm;
        +
        +import org.apache.mahout.math.Matrix;
        +import org.apache.mahout.math.MatrixWritable;
        +import org.apache.mahout.math.Vector;
        +import org.apache.mahout.math.VectorWritable;
        +
        +import java.io.DataInput;
        +import java.io.DataOutput;
        +import java.io.IOException;
        +
        +/**
        + * Utils for serializing Writable parts of HmmModel (that means without hidden state names and so on)
        + */
        +public class LossyHmmSerializer {
        + public static void serialize(HmmModel model, DataOutput output) throws IOException {
        + final MatrixWritable matrix = new MatrixWritable(model.getEmissionMatrix());
        + matrix.write(output);
        + matrix.set(model.getTransitionMatrix());
        + matrix.write(output);
        +
        + final VectorWritable vector = new VectorWritable(model.getInitialProbabilities());
        + vector.write(output);
        + }
        +
        + public static HmmModel deserialize(DataInput input) throws IOException {
        + final MatrixWritable matrix = new MatrixWritable();
        + matrix.readFields(input);
        + final Matrix emissionMatrix = matrix.get();
        +
        + matrix.readFields(input);
        + final Matrix transitionMatrix = matrix.get();
        +
        + final VectorWritable vector = new VectorWritable();
        + vector.readFields(input);
        + final Vector initialProbabilities = vector.get();
        +
        + return new HmmModel(transitionMatrix, emissionMatrix, initialProbabilities);
        + }
        +}
        \ No newline at end of file
        diff --git a/src/conf/driver.classes.props b/src/conf/driver.classes.props
        index ed72253..cc29fd3 100644
        --- a/src/conf/driver.classes.props
        +++ b/src/conf/driver.classes.props
        @@ -37,4 +37,5 @@ org.apache.mahout.math.hadoop.stochasticsvd.SSVDCli = ssvd : Stochastic SVD
         org.apache.mahout.clustering.spectral.eigencuts.EigencutsDriver = eigencuts : Eigencuts spectral clustering
         org.apache.mahout.clustering.spectral.kmeans.SpectralKMeansDriver = spectralkmeans : Spectral k-means clustering
         org.apache.mahout.cf.taste.hadoop.als.ParallelALSFactorizationJob = parallelALS : ALS-WR factorization of a rating matrix
        -org.apache.mahout.cf.taste.hadoop.als.PredictionJob = predictFromFactorization : predict preferences from a factorization of a rating matrix
        \ No newline at end of file
        +org.apache.mahout.cf.taste.hadoop.als.PredictionJob = predictFromFactorization : predict preferences from a factorization of a rating matrix
        +org.apache.mahout.classifier.sequencelearning.hmm.BaumWelchTrainer = baumwelch : Baum-Welch algorithm for unsupervised HMM training
        --
        1.7.1 ]
        Sergey Bartunov made changes -
        Comment [ From 7be42824a0767d4208b9dcd7da49beee06ff15ee Mon Sep 17 00:00:00 2001
        From: Sergey Bartunov <sbos.net@gmail.com>
        Date: Wed, 15 Jun 2011 01:04:39 +0400
        Subject: [PATCH 3/5] command-line util for baum-welch algorithm on HMM

        ---
         .../sequencelearning/hmm/BaumWelchTrainer.java | 127 ++++++++++++++++++++
         .../sequencelearning/hmm/LossyHmmSerializer.java | 57 +++++++++
         src/conf/driver.classes.props | 3 +-
         3 files changed, 186 insertions(+), 1 deletions(-)
         create mode 100644 core/src/main/java/org/apache/mahout/classifier/sequencelearning/hmm/BaumWelchTrainer.java
         create mode 100644 core/src/main/java/org/apache/mahout/classifier/sequencelearning/hmm/LossyHmmSerializer.java

        diff --git a/core/src/main/java/org/apache/mahout/classifier/sequencelearning/hmm/BaumWelchTrainer.java b/core/src/main/java/org/apache/mahout/classifier/sequencelearning/hmm/BaumWelchTrainer.java
        new file mode 100644
        index 0000000..410fcad
        --- /dev/null
        +++ b/core/src/main/java/org/apache/mahout/classifier/sequencelearning/hmm/BaumWelchTrainer.java
        @@ -0,0 +1,127 @@
        +/**
        + * Licensed to the Apache Software Foundation (ASF) under one or more
        + * contributor license agreements. See the NOTICE file distributed with
        + * this work for additional information regarding copyright ownership.
        + * The ASF licenses this file to You under the Apache License, Version 2.0
        + * (the "License"); you may not use this file except in compliance with
        + * the License. You may obtain a copy of the License at
        + *
        + * http://www.apache.org/licenses/LICENSE-2.0
        + *
        + * Unless required by applicable law or agreed to in writing, software
        + * distributed under the License is distributed on an "AS IS" BASIS,
        + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
        + * See the License for the specific language governing permissions and
        + * limitations under the License.
        + */
        +
        +package org.apache.mahout.classifier.sequencelearning.hmm;
        +
        +import org.apache.commons.cli2.CommandLine;
        +import org.apache.commons.cli2.Group;
        +import org.apache.commons.cli2.Option;
        +import org.apache.commons.cli2.OptionException;
        +import org.apache.commons.cli2.builder.ArgumentBuilder;
        +import org.apache.commons.cli2.builder.DefaultOptionBuilder;
        +import org.apache.commons.cli2.builder.GroupBuilder;
        +import org.apache.commons.cli2.commandline.Parser;
        +import org.apache.mahout.common.CommandLineUtil;
        +
        +import java.io.DataOutputStream;
        +import java.io.FileInputStream;
        +import java.io.FileOutputStream;
        +import java.io.IOException;
        +import java.util.ArrayList;
        +import java.util.Date;
        +import java.util.List;
        +import java.util.Scanner;
        +
        +/**
        + * A class for EM training of HMM from console
        + */
        +public class BaumWelchTrainer {
        + public static void main(String[] args) throws IOException {
        + final DefaultOptionBuilder optionBuilder = new DefaultOptionBuilder();
        + final ArgumentBuilder argumentBuilder = new ArgumentBuilder();
        +
        + final Option inputOption = optionBuilder.withLongName("input").
        + withDescription("Text file with space-separated integers to train on").
        + withShortName("i").withArgument(argumentBuilder.withMaximum(1).withMinimum(1).
        + withName("path").create()).withRequired(true).create();
        +
        + final Option outputOption = optionBuilder.withLongName("output").
        + withDescription("Path trained HMM model should be serialized to").
        + withShortName("o").withArgument(argumentBuilder.withMaximum(1).withMinimum(1).
        + withName("path").create()).withRequired(true).create();
        +
        + final Option stateNumberOption = optionBuilder.withLongName("nrOfHiddenStates").
        + withDescription("Number of hidden states").
        + withShortName("nh").withArgument(argumentBuilder.withMaximum(1).withMinimum(1).
        + withName("number").create()).withRequired(true).create();
        +
        + final Option observedStateNumberOption = optionBuilder.withLongName("nrOfObservedStates").
        + withDescription("Number of observed states").
        + withShortName("no").withArgument(argumentBuilder.withMaximum(1).withMinimum(1).
        + withName("number").create()).withRequired(true).create();
        +
        + final Option epsilonOption = optionBuilder.withLongName("epsilon").
        + withDescription("Convergence threshold").
        + withShortName("e").withArgument(argumentBuilder.withMaximum(1).withMinimum(1).
        + withName("number").create()).withRequired(true).create();
        +
        + final Option iterationsOption = optionBuilder.withLongName("max-iterations").
        + withDescription("Maximum iterations number").
        + withShortName("m").withArgument(argumentBuilder.withMaximum(1).withMinimum(1).
        + withName("number").create()).withRequired(true).create();
        +
        + final Group optionGroup = new GroupBuilder().withOption(inputOption).
        + withOption(outputOption).withOption(stateNumberOption).withOption(observedStateNumberOption).
        + withOption(epsilonOption).withOption(iterationsOption).
        + withName("Options").create();
        +
        + try {
        + final Parser parser = new Parser();
        + parser.setGroup(optionGroup);
        + final CommandLine commandLine = parser.parse(args);
        +
        + final String input = (String) commandLine.getValue(inputOption);
        + final String output = (String) commandLine.getValue(outputOption);
        +
        + final int nrOfHiddenStates = Integer.parseInt((String) commandLine.getValue(stateNumberOption));
        + final int nrOfObservedStates = Integer.parseInt((String) commandLine.getValue(observedStateNumberOption));
        +
        + final double epsilon = Double.parseDouble((String) commandLine.getValue(epsilonOption));
        + final int maxIterations = Integer.parseInt((String) commandLine.getValue(iterationsOption));
        +
        + //constructing random-generated HMM
        + final HmmModel model = new HmmModel(nrOfHiddenStates, nrOfObservedStates, new Date().getTime());
        + final List<Integer> observations = new ArrayList<Integer>();
        +
        + //reading observations
        + final FileInputStream inputStream = new FileInputStream(input);
        + final Scanner scanner = new Scanner(inputStream);
        +
        + while (scanner.hasNextInt()) {
        + observations.add(scanner.nextInt());
        + }
        +
        + scanner.close();
        + inputStream.close();
        +
        + final int[] observationsArray = new int[observations.size()];
        + for (int i = 0; i < observations.size(); ++i)
        + observationsArray[i] = observations.get(i);
        +
        + //training
        + final HmmModel trainedModel = HmmTrainer.trainBaumWelch(model,
        + observationsArray, epsilon, maxIterations, true);
        +
        + //serializing trained model
        + final DataOutputStream stream = new DataOutputStream(new FileOutputStream(output));
        + LossyHmmSerializer.serialize(trainedModel, stream);
        + stream.close();
        + } catch (OptionException e) {
        + CommandLineUtil.printHelp(optionGroup);
        + }
        + }
        +}
        diff --git a/core/src/main/java/org/apache/mahout/classifier/sequencelearning/hmm/LossyHmmSerializer.java b/core/src/main/java/org/apache/mahout/classifier/sequencelearning/hmm/LossyHmmSerializer.java
        new file mode 100644
        index 0000000..8bbb814
        --- /dev/null
        +++ b/core/src/main/java/org/apache/mahout/classifier/sequencelearning/hmm/LossyHmmSerializer.java
        @@ -0,0 +1,57 @@
        +/**
        + * Licensed to the Apache Software Foundation (ASF) under one or more
        + * contributor license agreements. See the NOTICE file distributed with
        + * this work for additional information regarding copyright ownership.
        + * The ASF licenses this file to You under the Apache License, Version 2.0
        + * (the "License"); you may not use this file except in compliance with
        + * the License. You may obtain a copy of the License at
        + *
        + * http://www.apache.org/licenses/LICENSE-2.0
        + *
        + * Unless required by applicable law or agreed to in writing, software
        + * distributed under the License is distributed on an "AS IS" BASIS,
        + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
        + * See the License for the specific language governing permissions and
        + * limitations under the License.
        + */
        +
        +package org.apache.mahout.classifier.sequencelearning.hmm;
        +
        +import org.apache.mahout.math.Matrix;
        +import org.apache.mahout.math.MatrixWritable;
        +import org.apache.mahout.math.Vector;
        +import org.apache.mahout.math.VectorWritable;
        +
        +import java.io.DataInput;
        +import java.io.DataOutput;
        +import java.io.IOException;
        +
        +/**
        + * Utils for serializing Writable parts of HmmModel (that means without hidden state names and so on)
        + */
        +public class LossyHmmSerializer {
        + public static void serialize(HmmModel model, DataOutput output) throws IOException {
        + final MatrixWritable matrix = new MatrixWritable(model.getEmissionMatrix());
        + matrix.write(output);
        + matrix.set(model.getTransitionMatrix());
        + matrix.write(output);
        +
        + final VectorWritable vector = new VectorWritable(model.getInitialProbabilities());
        + vector.write(output);
        + }
        +
        + public static HmmModel deserialize(DataInput input) throws IOException {
        + final MatrixWritable matrix = new MatrixWritable();
        + matrix.readFields(input);
        + final Matrix emissionMatrix = matrix.get();
        +
        + matrix.readFields(input);
        + final Matrix transitionMatrix = matrix.get();
        +
        + final VectorWritable vector = new VectorWritable();
        + vector.readFields(input);
        + final Vector initialProbabilities = vector.get();
        +
        + return new HmmModel(transitionMatrix, emissionMatrix, initialProbabilities);
        + }
        +}
        \ No newline at end of file
        diff --git a/src/conf/driver.classes.props b/src/conf/driver.classes.props
        index ed72253..cc29fd3 100644
        --- a/src/conf/driver.classes.props
        +++ b/src/conf/driver.classes.props
        @@ -37,4 +37,5 @@ org.apache.mahout.math.hadoop.stochasticsvd.SSVDCli = ssvd : Stochastic SVD
         org.apache.mahout.clustering.spectral.eigencuts.EigencutsDriver = eigencuts : Eigencuts spectral clustering
         org.apache.mahout.clustering.spectral.kmeans.SpectralKMeansDriver = spectralkmeans : Spectral k-means clustering
         org.apache.mahout.cf.taste.hadoop.als.ParallelALSFactorizationJob = parallelALS : ALS-WR factorization of a rating matrix
        -org.apache.mahout.cf.taste.hadoop.als.PredictionJob = predictFromFactorization : predict preferences from a factorization of a rating matrix
        \ No newline at end of file
        +org.apache.mahout.cf.taste.hadoop.als.PredictionJob = predictFromFactorization : predict preferences from a factorization of a rating matrix
        +org.apache.mahout.classifier.sequencelearning.hmm.BaumWelchTrainer = baumwelch : Baum-Welch algorithm for unsupervised HMM training
        --
        1.7.1

        From 0d5ef688fbc272fa2fc23d7fee4e03766c168b89 Mon Sep 17 00:00:00 2001
        From: Sergey Bartunov <sbos.net@gmail.com>
        Date: Wed, 15 Jun 2011 01:25:42 +0400
        Subject: [PATCH 4/5] Command line tool for Viterbi evaluation

        ---
         .../sequencelearning/hmm/ViterbiEvaluator.java | 119 ++++++++++++++++++++
         src/conf/driver.classes.props | 1 +
         2 files changed, 120 insertions(+), 0 deletions(-)
         create mode 100644 core/src/main/java/org/apache/mahout/classifier/sequencelearning/hmm/ViterbiEvaluator.java

        diff --git a/core/src/main/java/org/apache/mahout/classifier/sequencelearning/hmm/ViterbiEvaluator.java b/core/src/main/java/org/apache/mahout/classifier/sequencelearning/hmm/ViterbiEvaluator.java
        new file mode 100644
        index 0000000..22c5f44
        --- /dev/null
        +++ b/core/src/main/java/org/apache/mahout/classifier/sequencelearning/hmm/ViterbiEvaluator.java
        @@ -0,0 +1,119 @@
        +/**
        + * Licensed to the Apache Software Foundation (ASF) under one or more
        + * contributor license agreements. See the NOTICE file distributed with
        + * this work for additional information regarding copyright ownership.
        + * The ASF licenses this file to You under the Apache License, Version 2.0
        + * (the "License"); you may not use this file except in compliance with
        + * the License. You may obtain a copy of the License at
        + *
        + * http://www.apache.org/licenses/LICENSE-2.0
        + *
        + * Unless required by applicable law or agreed to in writing, software
        + * distributed under the License is distributed on an "AS IS" BASIS,
        + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
        + * See the License for the specific language governing permissions and
        + * limitations under the License.
        + */
        +
        +package org.apache.mahout.classifier.sequencelearning.hmm;
        +
        +import org.apache.commons.cli2.CommandLine;
        +import org.apache.commons.cli2.Group;
        +import org.apache.commons.cli2.Option;
        +import org.apache.commons.cli2.OptionException;
        +import org.apache.commons.cli2.builder.ArgumentBuilder;
        +import org.apache.commons.cli2.builder.DefaultOptionBuilder;
        +import org.apache.commons.cli2.builder.GroupBuilder;
        +import org.apache.commons.cli2.commandline.Parser;
        +import org.apache.mahout.common.CommandLineUtil;
        +
        +import java.io.*;
        +import java.util.ArrayList;
        +import java.util.List;
        +import java.util.Scanner;
        +
        +/**
        + * Command-line tool for Viterbi evaluating
        + */
        +public class ViterbiEvaluator {
        + public static void main(String[] args) throws IOException {
        + final DefaultOptionBuilder optionBuilder = new DefaultOptionBuilder();
        + final ArgumentBuilder argumentBuilder = new ArgumentBuilder();
        +
        + final Option inputOption = optionBuilder.withLongName("input").
        + withDescription("Text file with space-separated integers to segment").
        + withShortName("i").withArgument(argumentBuilder.withMaximum(1).withMinimum(1).
        + withName("path").create()).withRequired(true).create();
        +
        + final Option outputOption = optionBuilder.withLongName("output").
        + withDescription("Output directory with decoded sequence of hidden states").
        + withShortName("o").withArgument(argumentBuilder.withMaximum(1).withMinimum(1).
        + withName("path").create()).withRequired(true).create();
        +
        + final Option modelOption = optionBuilder.withLongName("model").
        + withDescription("Path to serialized HMM model").
        + withShortName("m").withArgument(argumentBuilder.withMaximum(1).withMinimum(1).
        + withName("path").create()).withRequired(true).create();
        +
        + final Option likelihoodOption = optionBuilder.withLongName("likelihood").
        + withDescription("Compute likelihood of observed sequence").
        + withShortName("l").withRequired(false).create();
        +
        + final Group optionGroup = new GroupBuilder().withOption(inputOption).
        + withOption(outputOption).withOption(modelOption).withOption(likelihoodOption).
        + withName("Options").create();
        +
        + try {
        + final Parser parser = new Parser();
        + parser.setGroup(optionGroup);
        + final CommandLine commandLine = parser.parse(args);
        +
        + final String input = (String) commandLine.getValue(inputOption);
        + final String output = (String) commandLine.getValue(outputOption);
        +
        + final String modelPath = (String) commandLine.getValue(modelOption);
        +
        + final boolean computeLikelihood = commandLine.hasOption(likelihoodOption);
        +
        + //reading serialized HMM
        + final DataInputStream modelStream = new DataInputStream(new FileInputStream(modelPath));
        + final HmmModel model = LossyHmmSerializer.deserialize(modelStream);
        + modelStream.close();
        +
        + //reading observations
        + final List<Integer> observations = new ArrayList<Integer>();
        + final FileInputStream inputStream = new FileInputStream(input);
        + final Scanner scanner = new Scanner(inputStream);
        +
        + while (scanner.hasNextInt()) {
        + observations.add(scanner.nextInt());
        + }
        +
        + scanner.close();
        + inputStream.close();
        +
        + final int[] observationsArray = new int[observations.size()];
        + for (int i = 0; i < observations.size(); ++i)
        + observationsArray[i] = observations.get(i);
        +
        + //decoding
        + final int[] hiddenStates = HmmEvaluator.decode(model, observationsArray, true);
        +
        + //writing output
        + final FileOutputStream outputStream = new FileOutputStream(output);
        + final PrintWriter writer = new PrintWriter(outputStream);
        + for (int i = 0; i < hiddenStates.length; ++i) {
        + writer.print(hiddenStates[i]);
        + writer.print(' ');
        + }
        + writer.close();
        + outputStream.close();
        +
        + if (computeLikelihood) {
        + System.out.println("Likelihood: " + HmmEvaluator.modelLikelihood(model, observationsArray, true));
        + }
        + } catch (OptionException e) {
        + CommandLineUtil.printHelp(optionGroup);
        + }
        + }
        +}
        diff --git a/src/conf/driver.classes.props b/src/conf/driver.classes.props
        index cc29fd3..0ed10ce 100644
        --- a/src/conf/driver.classes.props
        +++ b/src/conf/driver.classes.props
        @@ -39,3 +39,4 @@ org.apache.mahout.clustering.spectral.kmeans.SpectralKMeansDriver = spectralkmea
         org.apache.mahout.cf.taste.hadoop.als.ParallelALSFactorizationJob = parallelALS : ALS-WR factorization of a rating matrix
         org.apache.mahout.cf.taste.hadoop.als.PredictionJob = predictFromFactorization : predict preferences from a factorization of a rating matrix
         org.apache.mahout.classifier.sequencelearning.hmm.BaumWelchTrainer = baumwelch : Baum-Welch algorithm for unsupervised HMM training
        +org.apache.mahout.classifier.sequencelearning.hmm.ViterbiEvaluator = viterbi : Viterbi decoding of hidden states from given output states sequence
        --
        1.7.1

        From bce3ebc6e8f8d575f1fb0e05e6c69e5c9d374c6e Mon Sep 17 00:00:00 2001
        From: Sergey Bartunov <sbos.net@gmail.com>
        Date: Wed, 15 Jun 2011 01:39:13 +0400
        Subject: [PATCH 5/5] Command-line tool for generated random observations with given HMM

        ---
         .../hmm/RandomSequenceGenerator.java | 93 ++++++++++++++++++++
         .../sequencelearning/hmm/ViterbiEvaluator.java | 6 +-
         src/conf/driver.classes.props | 1 +
         3 files changed, 97 insertions(+), 3 deletions(-)
         create mode 100644 core/src/main/java/org/apache/mahout/classifier/sequencelearning/hmm/RandomSequenceGenerator.java

        diff --git a/core/src/main/java/org/apache/mahout/classifier/sequencelearning/hmm/RandomSequenceGenerator.java b/core/src/main/java/org/apache/mahout/classifier/sequencelearning/hmm/RandomSequenceGenerator.java
        new file mode 100644
        index 0000000..cb1a5c4
        --- /dev/null
        +++ b/core/src/main/java/org/apache/mahout/classifier/sequencelearning/hmm/RandomSequenceGenerator.java
        @@ -0,0 +1,93 @@
        +/**
        + * Licensed to the Apache Software Foundation (ASF) under one or more
        + * contributor license agreements. See the NOTICE file distributed with
        + * this work for additional information regarding copyright ownership.
        + * The ASF licenses this file to You under the Apache License, Version 2.0
        + * (the "License"); you may not use this file except in compliance with
        + * the License. You may obtain a copy of the License at
        + *
        + * http://www.apache.org/licenses/LICENSE-2.0
        + *
        + * Unless required by applicable law or agreed to in writing, software
        + * distributed under the License is distributed on an "AS IS" BASIS,
        + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
        + * See the License for the specific language governing permissions and
        + * limitations under the License.
        + */
        +
        +
        +package org.apache.mahout.classifier.sequencelearning.hmm;
        +
        +import org.apache.commons.cli2.CommandLine;
        +import org.apache.commons.cli2.Group;
        +import org.apache.commons.cli2.Option;
        +import org.apache.commons.cli2.OptionException;
        +import org.apache.commons.cli2.builder.ArgumentBuilder;
        +import org.apache.commons.cli2.builder.DefaultOptionBuilder;
        +import org.apache.commons.cli2.builder.GroupBuilder;
        +import org.apache.commons.cli2.commandline.Parser;
        +import org.apache.mahout.common.CommandLineUtil;
        +
        +import java.io.*;
        +import java.util.Date;
        +
        +/**
        + * Command-line tool for generating random sequences by given HMM
        + */
        +public class RandomSequenceGenerator {
        + public static void main(String[] args) throws IOException {
        + final DefaultOptionBuilder optionBuilder = new DefaultOptionBuilder();
        + final ArgumentBuilder argumentBuilder = new ArgumentBuilder();
        +
        + final Option outputOption = optionBuilder.withLongName("output").
        + withDescription("Output file with sequence of observed states").
        + withShortName("o").withArgument(argumentBuilder.withMaximum(1).withMinimum(1).
        + withName("path").create()).withRequired(false).create();
        +
        + final Option modelOption = optionBuilder.withLongName("model").
        + withDescription("Path to serialized HMM model").
        + withShortName("m").withArgument(argumentBuilder.withMaximum(1).withMinimum(1).
        + withName("path").create()).withRequired(true).create();
        +
        + final Option lengthOption = optionBuilder.withLongName("length").
        + withDescription("Length of generated sequence").
        + withShortName("l").withArgument(argumentBuilder.withMaximum(1).withMinimum(1).
        + withName("number").create()).withRequired(true).create();
        +
        + final Group optionGroup = new GroupBuilder().
        + withOption(outputOption).withOption(modelOption).withOption(lengthOption).
        + withName("Options").create();
        +
        + try {
        + final Parser parser = new Parser();
        + parser.setGroup(optionGroup);
        + final CommandLine commandLine = parser.parse(args);
        +
        + final String output = (String) commandLine.getValue(outputOption);
        +
        + final String modelPath = (String) commandLine.getValue(modelOption);
        +
        + final int length = Integer.parseInt((String) commandLine.getValue(lengthOption));
        +
        + //reading serialized HMM
        + final DataInputStream modelStream = new DataInputStream(new FileInputStream(modelPath));
        + final HmmModel model = LossyHmmSerializer.deserialize(modelStream);
        + modelStream.close();
        +
        + //generating observations
        + final int[] observations = HmmEvaluator.predict(model, length, new Date().getTime());
        +
        + //writing output
        + final FileOutputStream outputStream = new FileOutputStream(output);
        + final PrintWriter writer = new PrintWriter(outputStream);
        + for (int observation : observations) {
        + writer.print(observation);
        + writer.print(' ');
        + }
        + writer.close();
        + outputStream.close();
        + } catch (OptionException e) {
        + CommandLineUtil.printHelp(optionGroup);
        + }
        + }
        +}
        diff --git a/core/src/main/java/org/apache/mahout/classifier/sequencelearning/hmm/ViterbiEvaluator.java b/core/src/main/java/org/apache/mahout/classifier/sequencelearning/hmm/ViterbiEvaluator.java
        index 22c5f44..bcb6df2 100644
        --- a/core/src/main/java/org/apache/mahout/classifier/sequencelearning/hmm/ViterbiEvaluator.java
        +++ b/core/src/main/java/org/apache/mahout/classifier/sequencelearning/hmm/ViterbiEvaluator.java
        @@ -46,7 +46,7 @@ public class ViterbiEvaluator {
               withName("path").create()).withRequired(true).create();
         
             final Option outputOption = optionBuilder.withLongName("output").
        - withDescription("Output directory with decoded sequence of hidden states").
        + withDescription("Output file with decoded sequence of hidden states").
               withShortName("o").withArgument(argumentBuilder.withMaximum(1).withMinimum(1).
               withName("path").create()).withRequired(true).create();
         
        @@ -102,8 +102,8 @@ public class ViterbiEvaluator {
               //writing output
               final FileOutputStream outputStream = new FileOutputStream(output);
               final PrintWriter writer = new PrintWriter(outputStream);
        - for (int i = 0; i < hiddenStates.length; ++i) {
        - writer.print(hiddenStates[i]);
        + for (int hiddenState : hiddenStates) {
        + writer.print(hiddenState);
                 writer.print(' ');
               }
               writer.close();
        diff --git a/src/conf/driver.classes.props b/src/conf/driver.classes.props
        index 0ed10ce..f975ed7 100644
        --- a/src/conf/driver.classes.props
        +++ b/src/conf/driver.classes.props
        @@ -40,3 +40,4 @@ org.apache.mahout.cf.taste.hadoop.als.ParallelALSFactorizationJob = parallelALS
         org.apache.mahout.cf.taste.hadoop.als.PredictionJob = predictFromFactorization : predict preferences from a factorization of a rating matrix
         org.apache.mahout.classifier.sequencelearning.hmm.BaumWelchTrainer = baumwelch : Baum-Welch algorithm for unsupervised HMM training
         org.apache.mahout.classifier.sequencelearning.hmm.ViterbiEvaluator = viterbi : Viterbi decoding of hidden states from given output states sequence
        +org.apache.mahout.classifier.sequencelearning.hmm.RandomSequenceGenerator = hmmpredict : Generate random sequence of observations by given HMM
        --
        1.7.1
        ]
        Sergey Bartunov made changes -
        Attachment 0006-A-little-visualization-of-trained-model.patch [ 12482682 ]
        Sergey Bartunov made changes -
        Attachment 0006-A-little-visualization-of-trained-model.patch [ 12482682 ]
        Sergey Bartunov made changes -
        Attachment 0006-A-little-visualization-of-trained-model.patch [ 12482683 ]
        Sergey Bartunov made changes -
        Description Mahout already have HMM functionality, but it presents only in API.
        Command-line tools should be added and registered in driver.classes.props

        [this is my "traning" issue in Jira to learn how to commit patches to the Mahout, so please be merficul]
        Mahout already have HMM functionality, but it presents only in API.
        Command-line tools should be added and registered in driver.classes.props

        These patches are get from git against trunk of mahout's github
        [this is my "traning" issue in Jira to learn how to commit patches to the Mahout, so please be merficul]
        Sergey Bartunov made changes -
        Attachment 0003-command-line-util-for-baum-welch-algorithm-on-HMM.patch [ 12482611 ]
        Sergey Bartunov made changes -
        Attachment 0004-Command-line-tool-for-Viterbi-evaluation.patch [ 12482612 ]
        Sergey Bartunov made changes -
        Attachment 0005-Command-line-tool-for-generated-random-observations-.patch [ 12482613 ]
        Sergey Bartunov made changes -
        Attachment 0006-A-little-visualization-of-trained-model.patch [ 12482683 ]
        Sergey Bartunov made changes -
        Attachment hmm-utils.patch [ 12483979 ]
        Sean Owen made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Assignee Sean Owen [ srowen ]
        Resolution Fixed [ 1 ]
        Sean Owen made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

          People

          • Assignee:
            Sean Owen
            Reporter:
            Sergey Bartunov
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development