Thursday, May 7, 2026

Relevance Scores

 

TermMeaning
Similarity ScoreGeneral relevance score between vectors
Cosine Similarity ScoreSpecifically cosine-based similarity
Relevance ScoreMore general IR/RAG terminology
Retrieval ScoreScore used for ranking retrieved items
Distance ScoreUsed when lower = better (e.g. Euclidean distance)

JOURNY-005-REPLACED K-MEANS WITH SEMANTIC SLIDING WINDOW

 It is working

colab

# ============================================================
# STEP 1 — INSTALL REQUIRED LIBRARIES
# ============================================================

# Run this in Google Colab
!pip install -q google-generativeai numpy

# ============================================================
# STEP 2 — IMPORT LIBRARIES
# ============================================================

import os
import numpy as np
import google.generativeai as genai

from google.colab import userdata


# ============================================================
# STEP 3 — CONFIGURE GEMINI API
# ============================================================

# Store your API key in:
# Colab Secrets → GEMINI_API_KEY

GEMINI_API_KEY = userdata.get("GEMINI_API_KEY")

genai.configure(api_key=GEMINI_API_KEY)


# ============================================================
# STEP 4 — CREATE DATASET
# ============================================================

documents = [

    # --------------------------------------------------------
    # POD FAILURES / DEBUGGING
    # --------------------------------------------------------

    "CrashLoopBackOff occurs when a container repeatedly crashes after starting.",
    "OOMKilled happens when a container exceeds its memory limit.",
    "A container may crash due to missing environment variables.",
    "Incorrect command or entrypoint can cause container startup failure.",
    "Application errors inside the container often lead to restarts.",
    "kubectl logs retrieves logs from a running container.",
    "kubectl describe pod shows events and state transitions.",
    "Liveness probes determine if a container should be restarted.",
    "Readiness probes determine if a pod can receive traffic.",

    # --------------------------------------------------------
    # SCHEDULING
    # --------------------------------------------------------

    "Pods remain pending if no node satisfies resource requests.",
    "Node affinity restricts pods to specific nodes.",
    "Taints prevent pods from being scheduled on certain nodes.",
    "Tolerations allow pods to be scheduled on tainted nodes.",

    # --------------------------------------------------------
    # SERVICES
    # --------------------------------------------------------

    "ClusterIP services expose applications within the cluster.",
    "NodePort services expose applications on node IPs.",
    "LoadBalancer services expose applications externally.",
    "Ingress routes HTTP and HTTPS traffic to services.",

    # --------------------------------------------------------
    # STORAGE
    # --------------------------------------------------------

    "PersistentVolumes provide storage independent of pods.",
    "PersistentVolumeClaims request storage resources.",
    "StorageClasses define dynamic provisioning behavior.",

    # --------------------------------------------------------
    # DEPLOYMENTS
    # --------------------------------------------------------

    "Deployments manage replica sets and pod updates.",
    "Rolling updates gradually replace old pods with new ones.",
    "ReplicaSets maintain a stable number of pod replicas.",

    # --------------------------------------------------------
    # CONFIGURATION
    # --------------------------------------------------------

    "ConfigMaps store non-sensitive configuration data.",
    "Secrets store sensitive data like passwords and tokens.",
    "Environment variables can be injected from ConfigMaps and Secrets.",

    # --------------------------------------------------------
    # IMAGES / REGISTRY
    # --------------------------------------------------------

    "ImagePullBackOff occurs when Kubernetes cannot pull the container image.",
    "Incorrect image name or tag can cause image pull failures.",
    "Private registries require imagePullSecrets for authentication.",

    # --------------------------------------------------------
    # AUTOSCALING
    # --------------------------------------------------------

    "Horizontal Pod Autoscaler scales based on CPU or metrics.",

    # --------------------------------------------------------
    # SECURITY
    # --------------------------------------------------------

    "RBAC controls access permissions inside Kubernetes.",
    "RBAC misconfiguration can block access to resources.",

    # --------------------------------------------------------
    # NETWORKING
    # --------------------------------------------------------

    "NetworkPolicies control communication between pods.",

    # --------------------------------------------------------
    # CLEANUP
    # --------------------------------------------------------

    "Pods stuck in Terminating state may have finalizers blocking deletion."
]


print(f"Total documents: {len(documents)}")


# ============================================================
# STEP 5 — CREATE SLIDING WINDOW CHUNKS
# ============================================================

# WHY?
# ----
# Instead of KMeans clustering, we preserve
# natural neighboring context.
#
# Example:
# sentence1 + sentence2
# sentence2 + sentence3
#
# This is:
# - automatic
# - scalable
# - semantically safer


WINDOW_SIZE = 3
STRIDE = 1

smart_chunks = []

for i in range(0, len(documents) - WINDOW_SIZE + 1, STRIDE):

    chunk = documents[i:i + WINDOW_SIZE]

    chunk_text = "\n".join(chunk)

    smart_chunks.append(chunk_text)


print(f"Total chunks created: {len(smart_chunks)}")


# ============================================================
# STEP 6 — PREPARE STRUCTURED CHUNK DATA
# ============================================================

prepared_data = []

for i, chunk in enumerate(smart_chunks):

    prepared_data.append({
        "id": f"chunk_{i}",
        "text": chunk
    })


print(f"Prepared chunks: {len(prepared_data)}")


# ============================================================
# STEP 7 — CREATE EMBEDDING FUNCTION
# ============================================================

def get_embedding(text):

    response = genai.embed_content(
        model="models/gemini-embedding-001",
        content=text
    )

    return response["embedding"]


# ============================================================
# STEP 8 — GENERATE CHUNK EMBEDDINGS
# ============================================================

print("Generating embeddings...")

for item in prepared_data:

    embedding = get_embedding(item["text"])

    item["embedding"] = embedding


print("Embeddings added successfully.")


# ============================================================
# STEP 9 — NORMALIZATION FUNCTION
# ============================================================

# WHY?
# ----
# Makes cosine similarity more stable


def normalize(vec):

    vec = np.array(vec)

    return vec / np.linalg.norm(vec)


# ============================================================
# STEP 10 — COSINE SIMILARITY FUNCTION
# ============================================================

def cosine_similarity(a, b):

    a = normalize(a)
    b = normalize(b)

    return np.dot(a, b)


# ============================================================
# STEP 11 — RETRIEVAL FUNCTION
# ============================================================

# FEATURES:
# ----------
# 1. Query embedding
# 2. Cosine similarity
# 3. Threshold filtering
# 4. Re-ranking using keyword bonus


def retrieve(query, top_k=3, min_score=0.55):

    # --------------------------------------------------------
    # EMBED QUERY
    # --------------------------------------------------------

    query_embedding = get_embedding(query)

    scores = []

    # --------------------------------------------------------
    # COMPARE AGAINST ALL CHUNKS
    # --------------------------------------------------------

    for item in prepared_data:

        similarity = cosine_similarity(
            query_embedding,
            item["embedding"]
        )

        scores.append((similarity, item))

    # --------------------------------------------------------
    # SORT BY SIMILARITY
    # --------------------------------------------------------

    scores = sorted(
        scores,
        key=lambda x: x[0],
        reverse=True
    )

    # --------------------------------------------------------
    # RE-RANK USING KEYWORD BONUS
    # --------------------------------------------------------

    reranked = []

    query_words = query.lower().split()

    for sim, item in scores:

        text = item["text"].lower()

        keyword_bonus = sum(
            word in text for word in query_words
        )

        final_score = sim + (0.03 * keyword_bonus)

        reranked.append((final_score, item))

    # --------------------------------------------------------
    # SORT AGAIN AFTER RE-RANKING
    # --------------------------------------------------------

    reranked = sorted(
        reranked,
        key=lambda x: x[0],
        reverse=True
    )

    # --------------------------------------------------------
    # FILTER LOW SCORES
    # --------------------------------------------------------

    filtered = [
        x for x in reranked
        if x[0] >= min_score
    ]

    return filtered[:top_k]


# ============================================================
# STEP 12 — TEST RETRIEVAL
# ============================================================

test_queries = [

    "Why is my pod crashing repeatedly?",
    "How to debug Kubernetes logs?",
    "What causes OOMKilled?",
    "How do services work in Kubernetes?",
    "Why is my container restarting repeatedly?"
]


for query in test_queries:

    print("\n" + "=" * 80)
    print(f"QUERY: {query}\n")

    results = retrieve(query)

    for score, item in results:

        print(f"Score: {score:.4f}")
        print(item["text"])

        print("-" * 40)


# ============================================================
# STEP 13 — BUILD PROMPT
# ============================================================

def build_prompt(query, retrieved_chunks):

    context = "\n\n".join(
        [item["text"] for score, item in retrieved_chunks]
    )

    prompt = f"""
You are a Kubernetes expert.

Answer ONLY using the provided context.

If the answer is not present in the context,
say "I don't know".

Context:
{context}

Question:
{query}

Answer:
"""

    return prompt


# ============================================================
# STEP 14 — GENERATE ANSWER USING GEMINI
# ============================================================

def generate_answer(prompt):

    model = genai.GenerativeModel("gemini-3-flash-preview")

    #model = genai.GenerativeModel("gemini-1.5-flash")

    response = model.generate_content(prompt)

    return response.text


# ============================================================
# STEP 15 — COMPLETE RAG PIPELINE
# ============================================================

def rag_pipeline(query, top_k=3):

    # --------------------------------------------------------
    # RETRIEVE RELEVANT CHUNKS
    # --------------------------------------------------------

    retrieved_chunks = retrieve(
        query=query,
        top_k=top_k
    )

    # --------------------------------------------------------
    # BUILD PROMPT
    # --------------------------------------------------------

    prompt = build_prompt(
        query,
        retrieved_chunks
    )

    # --------------------------------------------------------
    # GENERATE FINAL ANSWER
    # --------------------------------------------------------

    answer = generate_answer(prompt)

    return answer, retrieved_chunks


# ============================================================
# STEP 16 — FINAL TEST
# ============================================================

query = "Why is my pod crashing?"

answer, sources = rag_pipeline(query)

print("\n" + "=" * 80)
print("\n" + query + "\n")
print("FINAL ANSWER:\n")

print(answer)

print("\n" + "=" * 80)
print("RETRIEVED SOURCES:\n")

for score, item in sources:

    print(f"Score: {score:.4f}")

    print(item["text"])

    print("-" * 40)


for query in test_queries :
  answer, sources = rag_pipeline(query)

  print("\n" + "=" * 80)
  print("\n" + query + "\n")
  print("FINAL ANSWER:\n")

  print(answer)

  print("\n" + "=" * 80)


=====================================================================================
OUTPUT (specifically check at the end how changing the query from "Why is my pod crashing?"
to "Why is my pod crashing repeatedly? " results in changes in scores.
=====================================================================================
Total documents: 34
Total chunks created: 32
Prepared chunks: 32
Generating embeddings...
Embeddings added successfully.

================================================================================
QUERY: Why is my pod crashing repeatedly?

Score: 0.7553
CrashLoopBackOff occurs when a container repeatedly crashes after starting.
OOMKilled happens when a container exceeds its memory limit.
A container may crash due to missing environment variables.
----------------------------------------
Score: 0.7235
Application errors inside the container often lead to restarts.
kubectl logs retrieves logs from a running container.
kubectl describe pod shows events and state transitions.
----------------------------------------
Score: 0.7164
A container may crash due to missing environment variables.
Incorrect command or entrypoint can cause container startup failure.
Application errors inside the container often lead to restarts.
----------------------------------------

================================================================================
QUERY: How to debug Kubernetes logs?

Score: 0.8163
Application errors inside the container often lead to restarts.
kubectl logs retrieves logs from a running container.
kubectl describe pod shows events and state transitions.
----------------------------------------
Score: 0.7739
Incorrect command or entrypoint can cause container startup failure.
Application errors inside the container often lead to restarts.
kubectl logs retrieves logs from a running container.
----------------------------------------
Score: 0.7589
kubectl logs retrieves logs from a running container.
kubectl describe pod shows events and state transitions.
Liveness probes determine if a container should be restarted.
----------------------------------------

================================================================================
QUERY: What causes OOMKilled?

Score: 0.7658
OOMKilled happens when a container exceeds its memory limit.
A container may crash due to missing environment variables.
Incorrect command or entrypoint can cause container startup failure.
----------------------------------------
Score: 0.7314
CrashLoopBackOff occurs when a container repeatedly crashes after starting.
OOMKilled happens when a container exceeds its memory limit.
A container may crash due to missing environment variables.
----------------------------------------
Score: 0.6270
A container may crash due to missing environment variables.
Incorrect command or entrypoint can cause container startup failure.
Application errors inside the container often lead to restarts.
----------------------------------------

================================================================================
QUERY: How do services work in Kubernetes?

Score: 0.7989
ClusterIP services expose applications within the cluster.
NodePort services expose applications on node IPs.
LoadBalancer services expose applications externally.
----------------------------------------
Score: 0.7814
Tolerations allow pods to be scheduled on tainted nodes.
ClusterIP services expose applications within the cluster.
NodePort services expose applications on node IPs.
----------------------------------------
Score: 0.7759
NodePort services expose applications on node IPs.
LoadBalancer services expose applications externally.
Ingress routes HTTP and HTTPS traffic to services.
----------------------------------------

================================================================================
QUERY: Why is my container restarting repeatedly?

Score: 0.8290
A container may crash due to missing environment variables.
Incorrect command or entrypoint can cause container startup failure.
Application errors inside the container often lead to restarts.
----------------------------------------
Score: 0.7850
CrashLoopBackOff occurs when a container repeatedly crashes after starting.
OOMKilled happens when a container exceeds its memory limit.
A container may crash due to missing environment variables.
----------------------------------------
Score: 0.7691
Incorrect command or entrypoint can cause container startup failure.
Application errors inside the container often lead to restarts.
kubectl logs retrieves logs from a running container.
----------------------------------------

================================================================================

Why is my pod crashing?

FINAL ANSWER:

Based on the context provided, your pod may be crashing for the following reasons:

*   Exceeding its memory limit (OOMKilled).
*   Missing environment variables.
*   Application errors inside the container.
*   Incorrect command or entrypoint causing startup failure.

================================================================================
RETRIEVED SOURCES:

Score: 0.7378
CrashLoopBackOff occurs when a container repeatedly crashes after starting.
OOMKilled happens when a container exceeds its memory limit.
A container may crash due to missing environment variables.
----------------------------------------
Score: 0.7156
Application errors inside the container often lead to restarts.
kubectl logs retrieves logs from a running container.
kubectl describe pod shows events and state transitions.
----------------------------------------
Score: 0.7082
A container may crash due to missing environment variables.
Incorrect command or entrypoint can cause container startup failure.
Application errors inside the container often lead to restarts.
----------------------------------------

================================================================================

Why is my pod crashing repeatedly?

FINAL ANSWER:

Based on the context provided, a pod may crash repeatedly due to the following reasons:

*   Missing environment variables.
*   Application errors inside the container.
*   An incorrect command or entrypoint.
*   Exceeding its memory limit (OOMKilled).

================================================================================

================================================================================

How to debug Kubernetes logs?

FINAL ANSWER:

kubectl logs retrieves logs from a running container.

================================================================================

================================================================================

What causes OOMKilled?

FINAL ANSWER:

OOMKilled happens when a container exceeds its memory limit.

================================================================================

================================================================================

How do services work in Kubernetes?

FINAL ANSWER:

ClusterIP services expose applications within the cluster. NodePort services expose applications on node IPs. LoadBalancer services expose applications externally. Ingress routes HTTP and HTTPS traffic to services.

================================================================================
ERROR:tornado.access:503 POST /v1beta/models/gemini-3-flash-preview:generateContent?%24alt=json%3Benum-encoding%3Dint (::1) 28782.23ms
================================================================================

Why is my container restarting repeatedly?

FINAL ANSWER:

Your container may be restarting repeatedly due to the following reasons:
*   Missing environment variables.
*   Incorrect command or entrypoint.
*   Application errors inside the container.
*   Exceeding its memory limit (OOMKilled).

This state is known as CrashLoopBackOff, which occurs when a container repeatedly crashes after starting.

================================================================================

Normalization Algorithms in Machine Learning





1. Feature Scaling (Traditional ML)

Technique Description Formula / Key Point Best Used When
Min-Max Normalization Scales data to [0, 1] or [a, b] X' = (X - min) / (max - min) Bounded data, Neural Networks
Standardization (Z-score) Mean = 0, Std = 1 X' = (X - μ) / σ Gaussian-like data, Linear models
Robust Scaling Uses median & IQR (robust to outliers) X' = (X - median) / IQR Data with outliers
MaxAbs Scaling Scales by maximum absolute value X' = X / max(|X|) Sparse data
Mean Normalization Centers around zero X' = (X - mean) / (max - min) Less common

2. Normalization for Vectors / Features

Technique Description Formula Use Case
L2 Normalization (Euclidean) Most common vector normalization X' = X / ||X||₂ Distance-based algorithms, Neural Networks
L1 Normalization (Manhattan) Sum of absolute values = 1 X' = X / ||X||₁ Sparse data, Feature importance
Max Normalization Divide by maximum value in vector X' = X / max(|X|) Simple scaling of feature vectors

3. Deep Learning Normalization Layers

Layer Year Key Idea Main Advantage Common Use Cases
Batch Normalization (BatchNorm) 2015 Normalize across batch dimension Accelerates training CNNs (ResNet, etc.)
Layer Normalization (LayerNorm) 2016 Normalize across features (per sample) Works with variable batch sizes Transformers
Instance Normalization 2017 Normalize per sample per channel Style transfer StyleGAN, artistic tasks
Group Normalization 2018 Normalize within groups of channels Good for small batch sizes Object detection
RMS Normalization (RMSNorm) - Normalize by Root Mean Square Simpler & faster Modern LLMs (Llama, etc.)

4. Other Specialized Normalization Techniques

  • Quantile Normalization — Makes distributions identical across samples (popular in bioinformatics)
  • Local Response Normalization (LRN) — Used in early CNNs like AlexNet
  • Weight Normalization — Reparameterizes weights instead of activations
  • Spectral Normalization — Constrains weight matrices for stable GAN training
  • Batch Renormalization — Improved and more stable version of BatchNorm
  • Filter Response Normalization (FRN) — Batch-independent normalization
  • Power Transform (Yeo-Johnson / Box-Cox) — Makes data more Gaussian-like
  • Contrast Normalization — Used in computer vision preprocessing

5. Quick Recommendation Guide

Scenario Recommended Technique
Classical ML (SVM, KNN, etc.) Standardization or Robust Scaling
Neural Networks (small batch) LayerNorm / GroupNorm
Large batch CNNs BatchNorm
Transformers / Large Language Models RMSNorm or LayerNorm
Data with outliers Robust Scaling
Images (style-related) Instance Normalization

L1, L2 and L-Inf Normalizations

Case 1 : L2 (Euclidean) normalization of (2,3) and (3,2)

Euclidean L2 normalization scales a vector so that its total length (magnitude) equals 1, effectively stripping away the "size" of the data while preserving its direction.

Hence, the normalized values of (2, 3) and (3, 2) are not the same. They point in different directions, and their normalized coordinates reflect that.

1. Calculate vector magnitudes

To normalize a vector, you first find its L2 norm (Euclidean distance from the origin) using the formula:

||v||₂ = √(Σ xi²)

For the example (2,3) and (3,2):

Vector A (2, 3):
√(2² + 3²) = √(4 + 9) = √13 ≈ 3.606

Vector B (3, 2):
√(3² + 2²) = √(9 + 4) = √13 ≈ 3.606

2. Divide by magnitude

You then divide each component of the original vector by this magnitude:

Normalized A:
(2/√13, 3/√13) ≈ (0.555, 0.832)

Normalized B:
(3/√13, 2/√13) ≈ (0.832, 0.555)

As you can see, the values are swapped, not identical. They represent distinct points on a unit circle.

Graph image

What does it represent?

It represents the orientation or relative proportions of the components.

In a vector of (2, 3), the second feature is 1.5× larger than the first. Normalization keeps this ratio intact while forcing the vector to sit exactly 1 unit away from the origin.

It effectively says:

“I don't care how much of this stuff we have in total; I only care about the mix or the direction.”

What are the uses?

Cosine Similarity: In Machine Learning (like recommendation systems), we often care about the angle between vectors rather than their magnitude. If you normalize two vectors, their dot product becomes their Cosine Similarity.

Stable Training: In Deep Learning, normalizing inputs or gradients prevents "exploding" values and helps the model converge faster because every feature is on the same scale (between -1 and 1).

Pattern Recognition: In image processing or computer vision, it helps recognize a shape or pattern regardless of the overall brightness or contrast of the image.

✅ Conclusion

The normalized values for (2, 3) and (3, 2) are distinct: approximately (0.55, 0.83) and (0.83, 0.55) respectively.

Euclidean L2 normalization represents the pure direction of a vector on a unit hypersphere, removing the influence of its magnitude.

Would you like to see how this compares to L1 (Manhattan) normalization, which is often used for probability distributions?


Case 2 : L2 (Euclidean) normalization of (2,3) and (10,15)

The L2 normalisations of (2, 3) and (10, 15) are identical.

Since (10, 15) is just a scaled-up version of (2, 3)—specifically 5 × (2, 3)—they both point in the exact same direction. Normalisation strips away that "5x" magnitude, leaving you with the same unit vector.

The Calculation

Vector A (2, 3):

Magnitude:
√(2² + 3²) = √13 ≈ 3.606

Normalised:
(2/3.606, 3/3.606) ≈ (0.555, 0.832)

Vector B (10, 15):

Magnitude:
√(10² + 15²) = √(100 + 225) = √325 ≈ 18.028

Normalised:
(10/18.028, 15/18.028) ≈ (0.555, 0.832)

Why this happens

Normalization treats these two vectors as "the same" because their internal ratio is the same (2:3 = 10:15). In many data science contexts, this is exactly what you want.

For example:

Text Analysis: A short document mentioned "Apple" 2 times and "Orange" 3 times. A long document mentions them 10 and 15 times. Normalization tells your algorithm that both documents have the same topic balance, regardless of their length.

Image Processing: A dim pixel (2, 3) and a bright pixel (10, 15) have the same chrominance (color), just different intensities. Normalization lets you process the color while ignoring the lighting.



✅ Conclusion

The L2 normalisation for both vectors is approximately (0.555, 0.832).

This confirms that normalization captures the proportional relationship between components rather than their absolute scale.



 L1 Normalization

L1 normalization (also known as the Manhattan Norm or Least Absolute Deviations) scales a vector so that the sum of the absolute values of its components equals 1.

Unlike L2 normalization, which focuses on the straight-line "Euclidean" distance, L1 normalization focuses on the "Taxicab" distance—measuring how far you'd travel if you could only move along grid lines.

1. The Formula

To L1 normalize a vector, you divide each component by the L1 norm (the sum of absolute values):

||v||₁ = Σ |xi|

For your examples:

Vector A (2, 3):
|2| + |3| = 5
Normalised: (2/5, 3/5) = (0.4, 0.6)

Vector B (10, 15):
|10| + |15| = 25
Normalised: (10/25, 15/25) = (0.4, 0.6)

As with L2, vectors with the same internal proportions (2:3) result in the same normalized vector.

2. What it represents

L1 normalization represents the contribution share of each element.

If a vector represents counts of different items, the L1 normalized version tells you the percentage or probability of each item occurring relative to the total.

In (0.4, 0.6), the first element represents 40% of the total magnitude and the second represents 60%.

3. Key Uses

Creating Probabilities: It is commonly used to transform a vector of raw scores into a Probability Mass Function (PMF) where all parts sum to 1.

Sparse Modeling (Lasso): In machine learning, L1 regularization Lasso Regression is famous for its ability to zero out less important features, performing automatic feature selection.

Robustness to Outliers: Because it doesn't square the values (unlike L2), L1 is much less sensitive to extreme outliers. An outlier value of 100 is treated as 100 in L1, but as 10,000 in L2.

Graph image

✅ Conclusion

L1 normalization scales a vector so that the sum of its absolute components is 1.

For both (2, 3) and (10, 15), the L1 normalized vector is (0.4, 0.6).

It is primarily used to represent proportions or to create sparse models that ignore irrelevant data.


L-Inf Normalization

L-Infinity Normalization (also called the Max Norm or Supremum Norm) scales a vector so that its largest component has a magnitude of exactly 1.

Instead of looking at the sum (L1) or the square root of squares (L2), it simply looks for the "peak" value in the set.

1. The Formula

To normalize a vector using L-Infinity, you divide every component by the absolute value of the largest component:

||v||∞ = max(|xi|)

For your examples:

Vector A (2, 3): The maximum value is 3.

Normalized: (2/3, 3/3) = (0.66, 1.0)

Vector B (10, 15): The maximum value is 15.

Normalized: (10/15, 15/15) = (0.66, 1.0)

Again, because the proportions are the same, the result is identical.

2. What it represents

L-Infinity normalization represents relative importance compared to the maximum.

It forces the most dominant feature to be the "baseline" (1.0) and scales everything else relative to that peak.

In geometry, if you plot all possible L-Infinity normalized vectors, they form a square (or a hypercube in higher dimensions) rather than a circle or a diamond.

3. Key Uses

Image Processing: It is used to normalize pixel intensities. If you have a dark image, L-infinity normalization scales the brightest pixel to 1.0 (pure white) and stretches all other pixels proportionally, effectively "auto-leveling" the brightness.

Adversarial Machine Learning: In cybersecurity, "L-infinity attacks" are used to test models. An attacker might change every pixel in an image by a tiny, equal amount. L-infinity captures the maximum change made to any single pixel.

Control Systems: It's used when there is a strict limit on a system—for example, if a motor can only handle a maximum of 5 volts, you normalize your control signals so no single output ever exceeds that physical "cap."




✅ Summary Table

Norm Result for (2, 3) Key Logic Best For
L1 (Plots a Diamond) (0.4, 0.6) Components sum to 1 Proportions & Probabilities
L2 (Plots a Circle) (0.55, 0.83) Distance to origin is 1 Directions & Angles
L-inf (Plots a Square) (0.66, 1.0) Max component is 1 Peak values & Constraints

NumPy functions for dot product and cosine similarity

 To calculate dot products and cosine similarity in NumPy, you primarily use np.dot() and np.linalg.norm(). While NumPy has a direct function for the dot product, cosine similarity is typically calculated by combining several operations. 

1. Dot Product
The dot product of two vectors is the sum of the products of their corresponding elements. 
  • np.dot(a, b): The standard function for computing the dot product.
  • a @ b: A more modern and readable operator for matrix multiplication and dot products introduced in Python 3.5.
  • np.inner(a, b): Computes the inner product, which for 1D arrays is identical to the dot product. 
2. Cosine Similarity
NumPy does not have a single "cosine_similarity" function, so you must implement the formula:
similarity  = A(dot)B/Magnitude(A).Magnitude(B)
You can implement this using:
Example Implementation:
python
import numpy as np
from numpy.linalg import norm

def cosine_similarity(a, b):
    return np.dot(a, b) / (norm(a) * norm(b))
Quick Comparison
Metric NumPy Function(s)Result Range
Dot Productnp.dot(a, b) or a @ b-infinity to +infinity
Cosine Similaritynp.dot(a, b) / (norm(a) * norm(b)) -1 to 1 
Note: For a direct, single-function implementation, many developers use the Scikit-learn cosine_similarity function or SciPy's spatial.distance.cosine (which returns cosine distance, or 1 - similarity)

Relevance Scores

  Term Meaning Similarity Score General relevance score between vectors Cosine Similarity Score Specifically cosine-based similarity Relevan...