LSTM Cells, Gates, Hidden State, and Cell State

The following points summarize the internal architecture and processing flow of an LSTM (Long Short-Term Memory) network in a structured and easy-to-understand manner.

1. Core LSTM Architecture

LSTM consists of one or more LSTM cells.
Each LSTM cell contains three processing units.
A processing unit inside an LSTM is called a Gate.
Every LSTM cell contains:
- Forget Gate
- Input Gate
- Output Gate
Each gate possesses its own:
- Weight Matrix
- Bias Vector
These parameters are learned during training and then frozen during inference.
The Input Gate contains an additional set of weight and bias parameters used for calculating the Candidate Cell State.

2. Internal Memory Components

Apart from gates, each LSTM cell maintains two internal memory structures.
These memory structures are:
- Cell State (CS)
- Hidden State (HS)
Both states are computed using previously stored state values.
At any time step, the current state values depend on:
- Current Input
- Previous Cell State
- Previous Hidden State
Both Cell State and Hidden State have the same dimensionality, which is specified as a model hyperparameter.

3. Sequential Processing Behavior

An LSTM processes inputs sequentially.
For example, when processing the sentence:
John likes black coffee
the LSTM processes:
- John
- likes
- black
- coffee
in four separate iterations.
Each processed word generates:
- A new Cell State
- A new Hidden State
At the beginning of processing:
- Cell State = Zero Vector
- Hidden State = Zero Vector

4. Input Preparation

The current input (INP) is concatenated with the current hidden state (CURRHS).
NOTE : In practice, the Input is never raw input, it is almost always pre-processed input.
The resulting vector:
[ CURRHS , INP ]
is fed simultaneously to all three gates.
Each gate processes this vector using its own trained weights and biases.
The Input Gate additionally computes the Candidate Cell State using a tanh activation function.
The outputs of the gates are then used to calculate:
- New Cell State
- New Hidden State
The New Hidden State is considered the primary output of the LSTM cell.

5. Forget Gate Computation

Forget Gate Output (FGOP)

The Forget Gate decides how much of the previous Cell State should be retained.

FGOP = sigmoid(WFG[CURRHS, INP] + BFG)

Where:

WFG = Forget Gate Weight Matrix
BFG = Forget Gate Bias Vector
FGOP = Forget Gate Output

6. Input Gate Computation

Input Gate Output (IPGOP)

IPGOP = sigmoid(WIPG[CURRHS, INP] + BIPG)

Candidate Cell State (CNDCS)

CNDCS = tanh(WCS[CURRHS, INP] + BCS)

Where:

WIPG = Input Gate Weight Matrix
BIPG = Input Gate Bias Vector
WCS = Candidate State Weight Matrix
BCS = Candidate State Bias Vector

7. New Cell State Calculation

The new Cell State is calculated by combining:

The retained portion of the old Cell State
The newly generated Candidate Cell State

NEWCS = (FGOP × CURRCS) + (IPGOP × CNDCS)

Where:

CURRCS = Current Cell State
FGOP = Forget Gate Output
IPGOP = Input Gate Output
CNDCS = Candidate Cell State
NEWCS = New Cell State

Important: Both multiplications are element-wise multiplications, not matrix multiplications.

8. Output Gate Computation

The Output Gate determines how much of the newly computed Cell State should become visible as the Hidden State.

OGOP = sigmoid(WOPG[CURRHS, INP] + BOPG)

Where:

WOPG = Output Gate Weight Matrix
BOPG = Output Gate Bias Vector
OGOP = Output Gate Output

9. Hidden State Calculation

After computing the New Cell State, the New Hidden State is calculated as:

NEWHS = OGOP × tanh(NEWCS)

Where:

NEWHS = New Hidden State
OGOP = Output Gate Output
NEWCS = New Cell State

The New Hidden State serves two purposes:

Acts as the output of the current LSTM cell.
Becomes the Hidden State input for the next time step.

10. State Propagation

After processing a word:

NEWCS becomes the Current Cell State for the next iteration.
NEWHS becomes the Current Hidden State for the next iteration.
NEWHS is also passed to the next LSTM layer if a stacked LSTM architecture is being used.

11. Complete Processing Flow

Current Input (INP)
          +
Current Hidden State (CURRHS)
          │
          ▼
     Concatenate
          │
          ▼
   [CURRHS, INP]
          │
 ┌────────┼─────────┐
 ▼        ▼         ▼
Forget   Input    Output
 Gate     Gate      Gate
  │         │         │
  │         │         ▼
  │         │       OGOP
  │         │
  │         ▼
  │       CNDCS
  │
  ▼
 FGOP

          │
          ▼

NEWCS =
(FGOP × CURRCS)
+
(IPGOP × CNDCS)

          │
          ▼

NEWHS =
OGOP × tanh(NEWCS)

          │
          ▼

Output +
Stored for Next Word

12. LSTM Components Summary

Component	Purpose
Forget Gate	Decides what information should be forgotten from the previous Cell State.
Input Gate	Determines what new information should be stored.
Candidate Cell State	Creates potential new memory content.
Output Gate	Controls what becomes visible as output.
Cell State	Long-term memory carried across time steps.
Hidden State	Short-term memory and output of the LSTM cell.

Final Mental Model

Forget Gate
     ↓
Remove old memory

Input Gate
     ↓
Add new memory

Cell State
     ↓
Long-term memory highway

Output Gate
     ↓
Expose useful information

Hidden State
     ↓
Output of LSTM

Think of an LSTM as a smart memory system that continuously decides:

What to forget
What to remember
What to reveal

This selective memory mechanism is what allows LSTMs to capture long-term dependencies much better than traditional RNNs.

RS Chandras Tech Blog | AI, ML, Agentic AI

Friday, June 12, 2026