The following points summarize the internal architecture and processing flow of an LSTM (Long Short-Term Memory) network in a structured and easy-to-understand manner.
1. Core LSTM Architecture
- LSTM consists of one or more LSTM cells.
- Each LSTM cell contains three processing units.
- A processing unit inside an LSTM is called a Gate.
- Every LSTM cell contains:
- Forget Gate
- Input Gate
- Output Gate
- Each gate possesses its own:
- Weight Matrix
- Bias Vector
- The Input Gate contains an additional set of weight and bias parameters used for calculating the Candidate Cell State.
2. Internal Memory Components
- Apart from gates, each LSTM cell maintains two internal memory structures.
-
These memory structures are:
- Cell State (CS)
- Hidden State (HS)
- Both states are computed using previously stored state values.
-
At any time step, the current state values depend on:
- Current Input
- Previous Cell State
- Previous Hidden State
- Both Cell State and Hidden State have the same dimensionality, which is specified as a model hyperparameter.
3. Sequential Processing Behavior
- An LSTM processes inputs sequentially.
-
For example, when processing the sentence:
John likes black coffeethe LSTM processes:
- John
- likes
- black
- coffee
-
Each processed word generates:
- A new Cell State
- A new Hidden State
-
At the beginning of processing:
- Cell State = Zero Vector
- Hidden State = Zero Vector
4. Input Preparation
-
The current input (INP) is concatenated with the current hidden state (CURRHS).
NOTE : In practice, the Input is never raw input, it is almost always pre-processed input.
-
The resulting vector:
[ CURRHS , INP ]is fed simultaneously to all three gates.
- Each gate processes this vector using its own trained weights and biases.
- The Input Gate additionally computes the Candidate Cell State using a tanh activation function.
-
The outputs of the gates are then used to calculate:
- New Cell State
- New Hidden State
- The New Hidden State is considered the primary output of the LSTM cell.
5. Forget Gate Computation
Forget Gate Output (FGOP)
The Forget Gate decides how much of the previous Cell State should be retained.
Where:
- WFG = Forget Gate Weight Matrix
- BFG = Forget Gate Bias Vector
- FGOP = Forget Gate Output
6. Input Gate Computation
Input Gate Output (IPGOP)
Candidate Cell State (CNDCS)
Where:
- WIPG = Input Gate Weight Matrix
- BIPG = Input Gate Bias Vector
- WCS = Candidate State Weight Matrix
- BCS = Candidate State Bias Vector
7. New Cell State Calculation
The new Cell State is calculated by combining:
- The retained portion of the old Cell State
- The newly generated Candidate Cell State
Where:
- CURRCS = Current Cell State
- FGOP = Forget Gate Output
- IPGOP = Input Gate Output
- CNDCS = Candidate Cell State
- NEWCS = New Cell State
8. Output Gate Computation
The Output Gate determines how much of the newly computed Cell State should become visible as the Hidden State.
Where:
- WOPG = Output Gate Weight Matrix
- BOPG = Output Gate Bias Vector
- OGOP = Output Gate Output
9. Hidden State Calculation
After computing the New Cell State, the New Hidden State is calculated as:
Where:
- NEWHS = New Hidden State
- OGOP = Output Gate Output
- NEWCS = New Cell State
The New Hidden State serves two purposes:
- Acts as the output of the current LSTM cell.
- Becomes the Hidden State input for the next time step.
10. State Propagation
After processing a word:
- NEWCS becomes the Current Cell State for the next iteration.
- NEWHS becomes the Current Hidden State for the next iteration.
- NEWHS is also passed to the next LSTM layer if a stacked LSTM architecture is being used.
11. Complete Processing Flow
Current Input (INP)
+
Current Hidden State (CURRHS)
│
▼
Concatenate
│
▼
[CURRHS, INP]
│
┌────────┼─────────┐
▼ ▼ ▼
Forget Input Output
Gate Gate Gate
│ │ │
│ │ ▼
│ │ OGOP
│ │
│ ▼
│ CNDCS
│
▼
FGOP
│
▼
NEWCS =
(FGOP × CURRCS)
+
(IPGOP × CNDCS)
│
▼
NEWHS =
OGOP × tanh(NEWCS)
│
▼
Output +
Stored for Next Word
12. LSTM Components Summary
| Component | Purpose |
|---|---|
| Forget Gate | Decides what information should be forgotten from the previous Cell State. |
| Input Gate | Determines what new information should be stored. |
| Candidate Cell State | Creates potential new memory content. |
| Output Gate | Controls what becomes visible as output. |
| Cell State | Long-term memory carried across time steps. |
| Hidden State | Short-term memory and output of the LSTM cell. |
Final Mental Model
Forget Gate
↓
Remove old memory
Input Gate
↓
Add new memory
Cell State
↓
Long-term memory highway
Output Gate
↓
Expose useful information
Hidden State
↓
Output of LSTM
Think of an LSTM as a smart memory system that continuously decides:
- What to forget
- What to remember
- What to reveal
This selective memory mechanism is what allows LSTMs to capture long-term dependencies much better than traditional RNNs.