Friday, June 12, 2026

LSTM Cells, Gates, Hidden State, and Cell State

The following points summarize the internal architecture and processing flow of an LSTM (Long Short-Term Memory) network in a structured and easy-to-understand manner.

1. Core LSTM Architecture

  1. LSTM consists of one or more LSTM cells.
  2. Each LSTM cell contains three processing units.
  3. A processing unit inside an LSTM is called a Gate.
  4. Every LSTM cell contains:
    • Forget Gate
    • Input Gate
    • Output Gate
  5. Each gate possesses its own:
    • Weight Matrix
    • Bias Vector
    These parameters are learned during training and then frozen during inference.
  6. The Input Gate contains an additional set of weight and bias parameters used for calculating the Candidate Cell State.

2. Internal Memory Components

  1. Apart from gates, each LSTM cell maintains two internal memory structures.
  2. These memory structures are:
    • Cell State (CS)
    • Hidden State (HS)
  3. Both states are computed using previously stored state values.
  4. At any time step, the current state values depend on:
    • Current Input
    • Previous Cell State
    • Previous Hidden State
  5. Both Cell State and Hidden State have the same dimensionality, which is specified as a model hyperparameter.

3. Sequential Processing Behavior

  1. An LSTM processes inputs sequentially.
  2. For example, when processing the sentence:
    John likes black coffee
    the LSTM processes:
    • John
    • likes
    • black
    • coffee
    in four separate iterations.
  3. Each processed word generates:
    • A new Cell State
    • A new Hidden State
  4. At the beginning of processing:
    • Cell State = Zero Vector
    • Hidden State = Zero Vector

4. Input Preparation

  1. The current input (INP) is concatenated with the current hidden state (CURRHS).
    NOTE : In practice, the Input is never raw input, it is almost always pre-processed input.
  2. The resulting vector:
    [ CURRHS , INP ]
    is fed simultaneously to all three gates.
  3. Each gate processes this vector using its own trained weights and biases.
  4. The Input Gate additionally computes the Candidate Cell State using a tanh activation function.
  5. The outputs of the gates are then used to calculate:
    • New Cell State
    • New Hidden State
  6. The New Hidden State is considered the primary output of the LSTM cell.

5. Forget Gate Computation

Forget Gate Output (FGOP)

The Forget Gate decides how much of the previous Cell State should be retained.

FGOP = sigmoid(WFG[CURRHS, INP] + BFG)

Where:

  • WFG = Forget Gate Weight Matrix
  • BFG = Forget Gate Bias Vector
  • FGOP = Forget Gate Output

6. Input Gate Computation

Input Gate Output (IPGOP)

IPGOP = sigmoid(WIPG[CURRHS, INP] + BIPG)

Candidate Cell State (CNDCS)

CNDCS = tanh(WCS[CURRHS, INP] + BCS)

Where:

  • WIPG = Input Gate Weight Matrix
  • BIPG = Input Gate Bias Vector
  • WCS = Candidate State Weight Matrix
  • BCS = Candidate State Bias Vector

7. New Cell State Calculation

The new Cell State is calculated by combining:

  • The retained portion of the old Cell State
  • The newly generated Candidate Cell State
NEWCS = (FGOP × CURRCS) + (IPGOP × CNDCS)

Where:

  • CURRCS = Current Cell State
  • FGOP = Forget Gate Output
  • IPGOP = Input Gate Output
  • CNDCS = Candidate Cell State
  • NEWCS = New Cell State
Important: Both multiplications are element-wise multiplications, not matrix multiplications.

8. Output Gate Computation

The Output Gate determines how much of the newly computed Cell State should become visible as the Hidden State.

OGOP = sigmoid(WOPG[CURRHS, INP] + BOPG)

Where:

  • WOPG = Output Gate Weight Matrix
  • BOPG = Output Gate Bias Vector
  • OGOP = Output Gate Output

9. Hidden State Calculation

After computing the New Cell State, the New Hidden State is calculated as:

NEWHS = OGOP × tanh(NEWCS)

Where:

  • NEWHS = New Hidden State
  • OGOP = Output Gate Output
  • NEWCS = New Cell State

The New Hidden State serves two purposes:

  • Acts as the output of the current LSTM cell.
  • Becomes the Hidden State input for the next time step.

10. State Propagation

After processing a word:

  • NEWCS becomes the Current Cell State for the next iteration.
  • NEWHS becomes the Current Hidden State for the next iteration.
  • NEWHS is also passed to the next LSTM layer if a stacked LSTM architecture is being used.

11. Complete Processing Flow

Current Input (INP)
          +
Current Hidden State (CURRHS)
          │
          ▼
     Concatenate
          │
          ▼
   [CURRHS, INP]
          │
 ┌────────┼─────────┐
 ▼        ▼         ▼
Forget   Input    Output
 Gate     Gate      Gate
  │         │         │
  │         │         ▼
  │         │       OGOP
  │         │
  │         ▼
  │       CNDCS
  │
  ▼
 FGOP

          │
          ▼

NEWCS =
(FGOP × CURRCS)
+
(IPGOP × CNDCS)

          │
          ▼

NEWHS =
OGOP × tanh(NEWCS)

          │
          ▼

Output +
Stored for Next Word

12. LSTM Components Summary

Component Purpose
Forget Gate Decides what information should be forgotten from the previous Cell State.
Input Gate Determines what new information should be stored.
Candidate Cell State Creates potential new memory content.
Output Gate Controls what becomes visible as output.
Cell State Long-term memory carried across time steps.
Hidden State Short-term memory and output of the LSTM cell.

Final Mental Model

Forget Gate
     ↓
Remove old memory

Input Gate
     ↓
Add new memory

Cell State
     ↓
Long-term memory highway

Output Gate
     ↓
Expose useful information

Hidden State
     ↓
Output of LSTM

Think of an LSTM as a smart memory system that continuously decides:

  • What to forget
  • What to remember
  • What to reveal

This selective memory mechanism is what allows LSTMs to capture long-term dependencies much better than traditional RNNs.

No comments:

Post a Comment

The Basics Of Tuples

In Python, a method or function that returns multiple values is commonly described as returning a tuple . When those returned values are ...