Saturday, May 30, 2026

How a Neural Network Calculates Loss During Supervised Training

Consider training of neural network on a labelled training dataset of cats and dogs. A neural network calculates loss during training by mathematically comparing its predicted output against the explicit ground-truth label provided in the training dataset. The network cannot detect an error by looking at an image alone; it relies entirely on human-provided answers (labels) to measure its mistakes.

Step 1: The Forward Pass

When a network sees an image for the first time, it performs a forward pass:

Input: The raw pixel values of the image are fed into the input layer.
Calculation: The pixels pass through hidden layers where they are multiplied by randomly initialized weights.
Prediction: The output layer generates a guess, usually formatted as decimal probabilities.
For example, if you feed the network a new image of a Cat, it might output:
[Cat: 0.20, Dog: 0.80] (It guessed a dog).

Step 2: The Ground-Truth Comparison

The network "knows" it is wrong because supervised training data pairs every image with an exact answer key called a ground-truth label. This label is converted into a vector using a process called one-hot encoding:

True Label for Cat:
[Cat: 1.0, Dog: 0.0]

Step 3: Calculating the Loss Value

The loss function acts as a mathematical evaluator that compares the prediction vector to the true label vector.

A common algorithm used for classification is Cross-Entropy Loss. It uses logarithms to aggressively penalize confident, incorrect guesses. Another basic alternative is Mean Squared Error (MSE):

Error=Prediction-True Label

Cat Node Error: 0.20 - 1.0 = -0.80

Dog Node Error: 0.80 - 0.0 = 0.80
These individual errors are processed by the loss function to produce a single number, the Loss Score. A high loss score means a terrible guess; a loss score close to zero means a near-perfect guess.

Step 4: Backpropagation and Readjusting Weights

Once the single loss score is determined, the network utilizes calculus to pinpoint exactly which internal weights caused the bad score.

The Chain Rule

The network calculates the gradient of the loss function. It traces backward from the output layer through the hidden layers using the mathematical chain rule.

Attributing Blame

This step determines how much each specific weight contributed to the overall error score.

Gradient Descent

An optimizer algorithm updates the internal weights by nudging them in the opposite direction of the error gradient.

Training Outcome

Over millions of iterations across a diverse training dataset, this cycle repeatedly reduces the loss score until the network correctly prioritizes the features of a cat over a dog.

No comments:

Post a Comment

Machine Learning and AI Model Taxonomy

The following table compares major categories of Machine Learning, Deep Learning, Generative AI, and Reinforcem...