rschandrastechblog: L1, L2 and L-Inf Normalizations

Case 1 : L2 (Euclidean) normalization of (2,3) and (3,2)

Euclidean L2 normalization scales a vector so that its total length (magnitude) equals 1, effectively stripping away the "size" of the data while preserving its direction.

Hence, the normalized values of (2, 3) and (3, 2) are not the same. They point in different directions, and their normalized coordinates reflect that.

1. Calculate vector magnitudes

To normalize a vector, you first find its L2 norm (Euclidean distance from the origin) using the formula:

||v||₂ = √(Σ xi²)

For the example (2,3) and (3,2):

Vector A (2, 3):
√(2² + 3²) = √(4 + 9) = √13 ≈ 3.606

Vector B (3, 2):
√(3² + 2²) = √(9 + 4) = √13 ≈ 3.606

2. Divide by magnitude

You then divide each component of the original vector by this magnitude:

Normalized A:
(2/√13, 3/√13) ≈ (0.555, 0.832)

Normalized B:
(3/√13, 2/√13) ≈ (0.832, 0.555)

As you can see, the values are swapped, not identical. They represent distinct points on a unit circle.

Graph image

What does it represent?

It represents the orientation or relative proportions of the components.

In a vector of (2, 3), the second feature is 1.5× larger than the first. Normalization keeps this ratio intact while forcing the vector to sit exactly 1 unit away from the origin.

It effectively says:

“I don't care how much of this stuff we have in total; I only care about the mix or the direction.”

What are the uses?

Cosine Similarity: In Machine Learning (like recommendation systems), we often care about the angle between vectors rather than their magnitude. If you normalize two vectors, their dot product becomes their Cosine Similarity.

Stable Training: In Deep Learning, normalizing inputs or gradients prevents "exploding" values and helps the model converge faster because every feature is on the same scale (between -1 and 1).

Pattern Recognition: In image processing or computer vision, it helps recognize a shape or pattern regardless of the overall brightness or contrast of the image.

✅ Conclusion

The normalized values for (2, 3) and (3, 2) are distinct: approximately (0.55, 0.83) and (0.83, 0.55) respectively.

Euclidean L2 normalization represents the pure direction of a vector on a unit hypersphere, removing the influence of its magnitude.

Would you like to see how this compares to L1 (Manhattan) normalization, which is often used for probability distributions?

Case 2 : L2 (Euclidean) normalization of (2,3) and (10,15)

The L2 normalisations of (2, 3) and (10, 15) are identical.

Since (10, 15) is just a scaled-up version of (2, 3)—specifically 5 × (2, 3)—they both point in the exact same direction. Normalisation strips away that "5x" magnitude, leaving you with the same unit vector.

The Calculation

Vector A (2, 3):

Magnitude:
√(2² + 3²) = √13 ≈ 3.606

Normalised:
(2/3.606, 3/3.606) ≈ (0.555, 0.832)

Vector B (10, 15):

Magnitude:
√(10² + 15²) = √(100 + 225) = √325 ≈ 18.028

Normalised:
(10/18.028, 15/18.028) ≈ (0.555, 0.832)

Why this happens

Normalization treats these two vectors as "the same" because their internal ratio is the same (2:3 = 10:15). In many data science contexts, this is exactly what you want.

For example:

Text Analysis: A short document mentioned "Apple" 2 times and "Orange" 3 times. A long document mentions them 10 and 15 times. Normalization tells your algorithm that both documents have the same topic balance, regardless of their length.

Image Processing: A dim pixel (2, 3) and a bright pixel (10, 15) have the same chrominance (color), just different intensities. Normalization lets you process the color while ignoring the lighting.

✅ Conclusion

The L2 normalisation for both vectors is approximately (0.555, 0.832).

This confirms that normalization captures the proportional relationship between components rather than their absolute scale.

L1 Normalization

L1 normalization (also known as the Manhattan Norm or Least Absolute Deviations) scales a vector so that the sum of the absolute values of its components equals 1.

Unlike L2 normalization, which focuses on the straight-line "Euclidean" distance, L1 normalization focuses on the "Taxicab" distance—measuring how far you'd travel if you could only move along grid lines.

1. The Formula

To L1 normalize a vector, you divide each component by the L1 norm (the sum of absolute values):

||v||₁ = Σ |xi|

For your examples:

Vector A (2, 3):
|2| + |3| = 5
Normalised: (2/5, 3/5) = (0.4, 0.6)

Vector B (10, 15):
|10| + |15| = 25
Normalised: (10/25, 15/25) = (0.4, 0.6)

As with L2, vectors with the same internal proportions (2:3) result in the same normalized vector.

2. What it represents

L1 normalization represents the contribution share of each element.

If a vector represents counts of different items, the L1 normalized version tells you the percentage or probability of each item occurring relative to the total.

In (0.4, 0.6), the first element represents 40% of the total magnitude and the second represents 60%.

3. Key Uses

Creating Probabilities: It is commonly used to transform a vector of raw scores into a Probability Mass Function (PMF) where all parts sum to 1.

Sparse Modeling (Lasso): In machine learning, L1 regularization Lasso Regression is famous for its ability to zero out less important features, performing automatic feature selection.

Robustness to Outliers: Because it doesn't square the values (unlike L2), L1 is much less sensitive to extreme outliers. An outlier value of 100 is treated as 100 in L1, but as 10,000 in L2.

Graph image

✅ Conclusion

L1 normalization scales a vector so that the sum of its absolute components is 1.

For both (2, 3) and (10, 15), the L1 normalized vector is (0.4, 0.6).

It is primarily used to represent proportions or to create sparse models that ignore irrelevant data.

L-Inf Normalization

L-Infinity Normalization (also called the Max Norm or Supremum Norm) scales a vector so that its largest component has a magnitude of exactly 1.

Instead of looking at the sum (L1) or the square root of squares (L2), it simply looks for the "peak" value in the set.

1. The Formula

To normalize a vector using L-Infinity, you divide every component by the absolute value of the largest component:

||v||∞ = max(|xi|)

For your examples:

Vector A (2, 3): The maximum value is 3.

Normalized: (2/3, 3/3) = (0.66, 1.0)

Vector B (10, 15): The maximum value is 15.

Normalized: (10/15, 15/15) = (0.66, 1.0)

Again, because the proportions are the same, the result is identical.

2. What it represents

L-Infinity normalization represents relative importance compared to the maximum.

It forces the most dominant feature to be the "baseline" (1.0) and scales everything else relative to that peak.

In geometry, if you plot all possible L-Infinity normalized vectors, they form a square (or a hypercube in higher dimensions) rather than a circle or a diamond.

3. Key Uses

Image Processing: It is used to normalize pixel intensities. If you have a dark image, L-infinity normalization scales the brightest pixel to 1.0 (pure white) and stretches all other pixels proportionally, effectively "auto-leveling" the brightness.

Adversarial Machine Learning: In cybersecurity, "L-infinity attacks" are used to test models. An attacker might change every pixel in an image by a tiny, equal amount. L-infinity captures the maximum change made to any single pixel.

Control Systems: It's used when there is a strict limit on a system—for example, if a motor can only handle a maximum of 5 volts, you normalize your control signals so no single output ever exceeds that physical "cap."

✅ Summary Table

Norm	Result for (2, 3)	Key Logic	Best For
L1 (Plots a Diamond)	(0.4, 0.6)	Components sum to 1	Proportions & Probabilities
L2 (Plots a Circle)	(0.55, 0.83)	Distance to origin is 1	Directions & Angles
L-inf (Plots a Square)	(0.66, 1.0)	Max component is 1	Peak values & Constraints

rschandrastechblog

Thursday, May 7, 2026

L1, L2 and L-Inf Normalizations

Case 1 : L2 (Euclidean) normalization of (2,3) and (3,2)

1. Calculate vector magnitudes

2. Divide by magnitude

What does it represent?

What are the uses?

✅ Conclusion

Case 2 : L2 (Euclidean) normalization of (2,3) and (10,15)

The Calculation

Why this happens

✅ Conclusion

L1 Normalization

1. The Formula

2. What it represents

3. Key Uses

✅ Conclusion

L-Inf Normalization

1. The Formula

2. What it represents

3. Key Uses

✅ Summary Table

No comments:

Post a Comment

Normalization Algorithms in Machine Learning

Report Abuse

Followers