1. Feature Scaling (Traditional ML)
| Technique | Description | Formula / Key Point | Best Used When |
|---|---|---|---|
| Min-Max Normalization | Scales data to [0, 1] or [a, b] | X' = (X - min) / (max - min) | Bounded data, Neural Networks |
| Standardization (Z-score) | Mean = 0, Std = 1 | X' = (X - μ) / σ | Gaussian-like data, Linear models |
| Robust Scaling | Uses median & IQR (robust to outliers) | X' = (X - median) / IQR | Data with outliers |
| MaxAbs Scaling | Scales by maximum absolute value | X' = X / max(|X|) | Sparse data |
| Mean Normalization | Centers around zero | X' = (X - mean) / (max - min) | Less common |
2. Normalization for Vectors / Features
| Technique | Description | Formula | Use Case |
|---|---|---|---|
| L2 Normalization (Euclidean) | Most common vector normalization | X' = X / ||X||₂ | Distance-based algorithms, Neural Networks |
| L1 Normalization (Manhattan) | Sum of absolute values = 1 | X' = X / ||X||₁ | Sparse data, Feature importance |
| Max Normalization | Divide by maximum value in vector | X' = X / max(|X|) | Simple scaling of feature vectors |
3. Deep Learning Normalization Layers
| Layer | Year | Key Idea | Main Advantage | Common Use Cases |
|---|---|---|---|---|
| Batch Normalization (BatchNorm) | 2015 | Normalize across batch dimension | Accelerates training | CNNs (ResNet, etc.) |
| Layer Normalization (LayerNorm) | 2016 | Normalize across features (per sample) | Works with variable batch sizes | Transformers |
| Instance Normalization | 2017 | Normalize per sample per channel | Style transfer | StyleGAN, artistic tasks |
| Group Normalization | 2018 | Normalize within groups of channels | Good for small batch sizes | Object detection |
| RMS Normalization (RMSNorm) | - | Normalize by Root Mean Square | Simpler & faster | Modern LLMs (Llama, etc.) |
4. Other Specialized Normalization Techniques
- Quantile Normalization — Makes distributions identical across samples (popular in bioinformatics)
- Local Response Normalization (LRN) — Used in early CNNs like AlexNet
- Weight Normalization — Reparameterizes weights instead of activations
- Spectral Normalization — Constrains weight matrices for stable GAN training
- Batch Renormalization — Improved and more stable version of BatchNorm
- Filter Response Normalization (FRN) — Batch-independent normalization
- Power Transform (Yeo-Johnson / Box-Cox) — Makes data more Gaussian-like
- Contrast Normalization — Used in computer vision preprocessing
5. Quick Recommendation Guide
| Scenario | Recommended Technique |
|---|---|
| Classical ML (SVM, KNN, etc.) | Standardization or Robust Scaling |
| Neural Networks (small batch) | LayerNorm / GroupNorm |
| Large batch CNNs | BatchNorm |
| Transformers / Large Language Models | RMSNorm or LayerNorm |
| Data with outliers | Robust Scaling |
| Images (style-related) | Instance Normalization |
No comments:
Post a Comment