Details

Type: New Feature
Status: Resolved
Priority: Major
Resolution: Incomplete
Affects Version/s: None
Fix Version/s: None
Component/s: MLlib
Labels:
- bulk-closed

Description

Had an offline chat with Lil'Rex, who implemented HMM on Spark at https://github.com/apache/spark/compare/master...lilrex:sequence. I asked him to list popular HMM applications, describe public API (params, input/output schemas), compare its API with existing HMM implementations.

Hidden Markov Model (HMM) Design Doc

Overview

Introduction to HMM

Hidden Markov Model is a type of statistical Machine Learning model that assumes a sequence of observations is generated by a Markov process with hidden states. There are 3 (or 2, depending on the implementation) main components of the model:

Transition Probability: describes the probability distribution of transitions from each state to other states (including self) in the Markov process
Emission Probability: describes the probability distribution for an observation associated with hidden states
Initial/Start Probability (optional): represents the prior probability of each state at the beginning of the observation sequence

Note: some implementations merge the Initial Probability into Transition Probability by adding an arbitrary Start state before the first observation point.

HMM Models and Algorithms

Given a limited number of states, most HMM models have the same form of Transition Probability: a matrix, where each element (i, j) represents the probability of transition from state i to state j. The Initial Probability usually take the simple form of a probabilistic vector.

The Emission Probability, on the other hand, can be represented in many different ways, depending on different nature of observations, i.e. continuous vs. discrete, or different model assumptions, e.g. single Gaussian vs. Gaussian Mixtures.

There are three main problems associated with HMM models, and their canonical algorithms:

Evaluation: What’s the probability of a given observation sequence, based on the model? It’s usually done by either Forward or Backward algorithms
Decoding: What’s the most likely state sequence, given the observation sequence and the model? It’s usually done by Viterbi decoding
Learning: How to train the parameters of the model based on the observation sequences? Baum-Welch (Forward-Backward) is usually used as part of the EM algorithm in unsupervised training

Popular Applications of HMM

Speech Recognition
Part-of-speech Tagging
Named Entity Recognition
Machine Translation
Gene Prediction

Alternate Libraries

Mahout

Assume emission probability to be an m-by-n matrix
Use Baum-Welch algorithm for training and Viterbi algorithm for prediction
API (command line)
- training
  $ echo "0 1 2 2 2 1 1 0 0 3 3 3 2 1 2 1 1 1 1 2 2 2 0 0 0 0 0 0 2 2 2 0 0 0 0 0 0 2 2 2 3 3 3 3 3 3 2 3 2 3 2 3 2 1 3 0 0 0 1 0 1 0 2 1 2 1 2 1 2 3 3 3 3 2 2 3 2 1 1 0" > hmm-input
  $ export MAHOUT_LOCAL=true
  $ $MAHOUT_HOME/bin/mahout baumwelch -i hmm-input -o hmm-model -nh 3 -no 4 -e .0001 -m 1000
- prediction
  $ $MAHOUT_HOME/bin/mahout hmmpredict -m hmm-model -o hmm-predictions -l 10
  $ cat hmm-predictions

Mallet

Treat HMM as a Finite State Transducer (FST)
Theoretically can go beyond first-order Markov assumption by setting an arbitrary order
Limited to text data, i.e. discrete observation sequence with Multinomial emission model assumption
Supervised training only
API:
- Training:
  HMM hmm = new HMM(pipe, null);
  hmm.addStatesForLabelsConnectedAsIn(trainingInstances);
  HMMTrainerByLikelihood trainer = new HMMTrainerByLikelihood(hmm);
  trainer.train(trainingInstances, 10);
- Testing:
  evaluator.evaluate(trainer);

HMMLearn

Previously part of scikit-learn
Algo:
- Standard HMM unsupervised training algorithm
- Three types of emission models: GMM, Gaussian and Multinomial
API:
- Training:
  model = hmm.GaussianHMM(n_components=3, covariance_type="full")
  model.fit(X)
- Testing:
  hidden_states = model.predict(X)

API

Design Goals

Build the foundation for general Sequential Tagging models (HMM, CRF, etc.)
Support multiple Emission Probability models such as “Multinomial” and “Gaussian Mixture”
Keep both supervised and unsupervised learning for HMM in mind

Proposed API

Note: This is written for the spark.ml API.

Decoder API

Decoder.scala

trait DecoderParams extends Params {
  def featureCol: DataType // column of sequential features, e.g. MatrixUDT
  def predictionCol: DoubleType // column of prediction
  def labelCol: DataType // column of sequential labels, e.g. VectorUDT
}

abstract class Decoder extends Estimator with DecoderParams {
  def extractLabeledSequences(dataset: Dataset[_]): RDD[LabeledSequence]
}

abstract class DecodingModel extends Model with DecoderParams {
  def numFeatures: Int
  def decode(features: FeatureType): Vector
}

Tagger API

Tagger.scala

trait TaggerParams extends DecoderParams with HasRawPredictionCol {
  def rawPredictionCol: MatrixUDT // column for all predicted label sequences
}

abstract class Tagger extends Decoder with TaggerParams

abstract class TaggingModel extends DecodingModel with TaggerParams {
  def numClasses: Int
  def decodeRaw(features: FeaturesType): Array[(Double, Vector)]
  def raw2prediction(rawPrediction: Array[(Double, Vector)]): Vector
}

MarginalTagger (Probabilistic Tagger) API

MarginalTagger.scala

trait MarginalTaggerParams extends TaggerParams with HasProbabilityCol with HasThreshold {
  def probabilityCol: DoubleType // column for probability
}

abstract class MarginalTagger extends Tagger with MarginalTaggerParams

abstract class MarginalTaggingModel extends TaggingModel with MarginalTaggerParams {
  def getMargin(featuers: FeaturesType): Double
}

HiddenMarkovModel API

HiddenMarkovModel.scala

trait HiddenMarkovModelParams extends MarginalTaggerParams with HasMaxIter with HasTol with HasStandardization with HasThreshold {
  def smoothing: DoubleParam
  def emissionType: Param[String] //can be either Multinomial or Gaussian
}

class HiddenMarkovModel extends MarginalTagger with HiddenMarkovModelParams {
  def initialModel: Option[HMMModel] //initial model before training
  def def trainSupervised(dataset: Dataset[_]): Option[HMMModel]
  def trainUnsupervised(dataset: Dataset[_]): HMMModel
}

abstract class HMMModel extends MarginalTaggingModel with HiddenMarkovModelParams {
  def initialProb: Vector // initial probability for states
  def transitionProb: DenseMatrix // transition probability between states

  //compute feature scores for all states in all frames in a sequence, different for different emission models, e.g. Multinomial vs. GMM
  def precomputeEmission(features: Matrix): List[Array[Double]] 

  // accumulate sufficient statistics for emission model  
  def getEmissionStats(features: Matrix, gammas: List[Array[Double]]): DenseMatrix

  // forward algorithm
  def forward(emissions: Traversable[Array[Double]]): List[Array[Double]] 

  // backword algorithm
  def backward(emissions: Traversable[Array[Double]]):List[Array[Double]] 
}

class MultinomialHMMModel extends HMMModel {
  def emissionProb: Matrix // emission probability from states to observations
}

object MultinomialHMMModel extends MLReadable[MultinomialHMMModel]

class HMMModelReader extends MLReader[MultinomialHMMModel]

Hidden Markov Model (HMM)

Details

Description

Hidden Markov Model (HMM) Design Doc

Overview

Introduction to HMM

HMM Models and Algorithms

Popular Applications of HMM

Alternate Libraries

API

Proposed API

Attachments

Activity

People

Dates