Markov Chains, Hidden States, and Hidden Markov Models (HMMs)

A Markov Chain is a mathematical system that models how things move from one state to another, based on the rule that the next state depends only on the current state. A Hidden State refers to an underlying, unobservable true state of a system that can only be guessed by looking at visible outputs.

These concepts are foundational to probability, statistics, and machine learning, and they often work together in what is known as a Hidden Markov Model (HMM).

Key Idea: A Markov Chain models transitions between states, while a Hidden Markov Model extends this concept by introducing hidden states that cannot be observed directly.

1. Markov Chain: The Basics

A Markov Chain describes a series of events where the probability of the next event happening depends entirely on the present event, completely ignoring the past. This is known as the Markov Property or "memorylessness".

Example: Weather Forecasting

Imagine the weather. If today is Sunny, tomorrow might have:

70% chance of being Sunny
20% chance of being Cloudy
10% chance of being Rainy

Because you can directly see and measure the weather, this can be modeled as an observable Markov Chain.

2. Hidden State: The Invisible Driver

In many real-world scenarios, you cannot directly observe the state of a system. Instead, you have:

Hidden States → The actual, unobservable conditions.
Observations → The visible results influenced by those hidden states.

Example: Inferring a Person's Mood

Imagine you want to track a person's mood (Happy or Sad), but they are locked in a room.

Mood → Hidden State
Shirt Color (Red, Green, Blue) → Observation

Although you cannot directly observe the person's mood, you can observe the shirt color they wear each day. Using a Hidden Markov Model, you can infer the most likely mood sequence based on the observed shirt colors.

3. How They Work Together: Hidden Markov Models (HMMs)

In a Hidden Markov Model, the hidden states themselves form a Markov Chain. For example, a person's mood today influences their mood tomorrow.

To make this work, the model relies on three core probability components:

Component	Purpose	Example
Transition Probabilities	Probability of moving from one hidden state to another.	Chance a Sad mood follows a Happy mood.
Emission Probabilities	Probability of seeing an observation given a hidden state.	Chance of wearing a Red shirt while Happy.
Initial State Probabilities	Probability of starting in a specific hidden state.	Probability that Day 1 starts Happy.

4. Real-World Applications

Experts generally agree that while basic Markov Chains are useful for simple predictions, Hidden Markov Models excel at interpreting noisy and partially observable data.

Speech Recognition
Translating audio waveforms (observations) into spoken words or phonemes (hidden states).
Natural Language Processing (NLP)
Assigning parts of speech such as nouns, verbs, or adjectives (hidden states) to observed words in a sentence.
Finance
Predicting hidden market regimes such as Bull Markets or Bear Markets from observed trading patterns and volatility.

The Three Classic HMM Machine Learning Tasks

Hidden Markov Models are traditionally used to solve three major classes of machine learning problems.

1. The Evaluation Task (Likelihood)

Objective: Compute the total probability of observing a specific sequence.

Problem: Given a trained model and a sequence of visible events, determine how likely it is that the sequence was generated by the model.

Algorithm: Forward-Backward Algorithm.

Example: Determining whether a sequence of network traffic logs resembles normal behavior or a cyber attack.

2. The Decoding Task (Inference)

Objective: Find the most likely sequence of hidden states.

Problem: You can see the outputs, but want to uncover the hidden state sequence that generated them.

Algorithm: Viterbi Algorithm.

Example: Part-of-Speech Tagging in NLP, where words are visible observations and grammatical categories are hidden states.

3. The Learning Task (Training)

Objective: Learn the model parameters from observed data.

Problem: Given only observation sequences, estimate the transition and emission probabilities.

Algorithm: Baum-Welch Algorithm (a special form of Expectation-Maximization).

Example: Training a speech recognition system using large collections of audio recordings.

Summary of Core HMM Tasks

Task	Question Being Answered	Algorithm	Output
Evaluation	How likely is this observation sequence?	Forward-Backward	Probability Score
Decoding	What hidden states generated the observations?	Viterbi	Most Likely State Sequence
Learning	What should the model parameters be?	Baum-Welch	Trained Model Parameters

Common Machine Learning Applications

Speech Recognition
Matching spoken audio signals (observations) to phonemes or words (hidden states).
Bioinformatics
Finding genes within DNA sequences by modeling patterns of nucleotides.
Stock Market Analysis
Predicting hidden market conditions such as Bull Markets and Bear Markets from observable market behavior.

Conclusion:

Observable Markov Chain:
State → State → State

Hidden Markov Model:
Hidden State → Hidden State → Hidden State
↓ ↓ ↓
Observation → Observation → Observation

The observations are visible, but the hidden states must be inferred.

RS Chandras Tech Blog | AI, ML, Agentic AI

Sunday, June 14, 2026