rschandrastechblog: Normalization Algorithms in Machine Learning

Thursday, May 7, 2026

1. Feature Scaling (Traditional ML)

Technique	Description	Formula / Key Point	Best Used When
Min-Max Normalization	Scales data to [0, 1] or [a, b]	X' = (X - min) / (max - min)	Bounded data, Neural Networks
Standardization (Z-score)	Mean = 0, Std = 1	X' = (X - μ) / σ	Gaussian-like data, Linear models
Robust Scaling	Uses median & IQR (robust to outliers)	X' = (X - median) / IQR	Data with outliers
MaxAbs Scaling	Scales by maximum absolute value	X' = X / max(\|X\|)	Sparse data
Mean Normalization	Centers around zero	X' = (X - mean) / (max - min)	Less common

Technique	Description	Formula	Use Case
L2 Normalization (Euclidean)	Most common vector normalization	X' = X / \|\|X\|\|₂	Distance-based algorithms, Neural Networks
L1 Normalization (Manhattan)	Sum of absolute values = 1	X' = X / \|\|X\|\|₁	Sparse data, Feature importance
Max Normalization	Divide by maximum value in vector	X' = X / max(\|X\|)	Simple scaling of feature vectors

Layer	Year	Key Idea	Main Advantage	Common Use Cases
Batch Normalization (BatchNorm)	2015	Normalize across batch dimension	Accelerates training	CNNs (ResNet, etc.)
Layer Normalization (LayerNorm)	2016	Normalize across features (per sample)	Works with variable batch sizes	Transformers
Instance Normalization	2017	Normalize per sample per channel	Style transfer	StyleGAN, artistic tasks
Group Normalization	2018	Normalize within groups of channels	Good for small batch sizes	Object detection
RMS Normalization (RMSNorm)	-	Normalize by Root Mean Square	Simpler & faster	Modern LLMs (Llama, etc.)

Quantile Normalization — Makes distributions identical across samples (popular in bioinformatics)
Local Response Normalization (LRN) — Used in early CNNs like AlexNet
Weight Normalization — Reparameterizes weights instead of activations
Spectral Normalization — Constrains weight matrices for stable GAN training
Batch Renormalization — Improved and more stable version of BatchNorm
Filter Response Normalization (FRN) — Batch-independent normalization
Power Transform (Yeo-Johnson / Box-Cox) — Makes data more Gaussian-like
Contrast Normalization — Used in computer vision preprocessing

Scenario	Recommended Technique
Classical ML (SVM, KNN, etc.)	Standardization or Robust Scaling
Neural Networks (small batch)	LayerNorm / GroupNorm
Large batch CNNs	BatchNorm
Transformers / Large Language Models	RMSNorm or LayerNorm
Data with outliers	Robust Scaling
Images (style-related)	Instance Normalization