Sunday, 28 September 2025

Neural Networks Explained Mathematically (with Example)

Neural networks are the backbone of modern AI — from recognizing images to powering chatbots. Let’s break them down step by step, with math, an example, and beginner-friendly explanations.



1. The Structure of a Neural Network

A neural network consists of:

  • Input layer: where features (data values) are fed in.

  • Hidden layers: where transformations happen.

  • Output layer: where predictions are generated.

Each connection has a weight (a number that determines importance) and each neuron has a bias (a small offset to adjust flexibility).




2. Forward Propagation (Prediction Step)

The math looks like this:

z=wx+bz = w \cdot x + b a=f(z)a = f(z)

  • ww: weight

  • xx: input

  • bb: bias

  • f(z)f(z): activation function (a rule that decides if the neuron should “fire” or not).

Activation functions add non-linearity:

  • Sigmoid: squashes output between 0 and 1.

  • ReLU: passes positive values, zeros out negatives.

🔹 Example: Predicting XOR (exclusive OR):

  • Input pairs: (0,0), (0,1), (1,0), (1,1)

  • Output: 0,1,1,0
    This can’t be solved by a single line → hence the need for hidden layers.




3. Loss Function (How Wrong Were We?)

The loss function measures how far predictions are from actual results.

For classification:

L=ylog(y^)L = - \sum y \log(\hat{y})

This is called cross-entropy lossa way to measure error when predicting probabilities.




4. Backpropagation (Learning from Mistakes)

Once we calculate the loss, we send this information backward to adjust weights.

  1. Compute gradient of loss w.r.t weights.

  2. Update weights in the opposite direction of the gradient.

This uses gradient descenta method of learning by taking small steps to minimize error.

Update rule:

w=wηLww = w - \eta \frac{\partial L}{\partial w}

  • η\eta: learning rate (how big the steps are).




5. Example Walkthrough: XOR Problem

Let’s solve the XOR problem step by step with a small 2-layer network.

  • Input layer: 2 neurons (x1, x2).

  • Hidden layer: 2 neurons (h1, h2).

  • Output layer: 1 neuron.

Step 1: Forward pass

  • Each hidden neuron: h=f(wx+b)h = f(w \cdot x + b).

  • Output neuron combines h1,h2h1, h2.

Step 2: Compute loss
Compare prediction with actual XOR output using cross-entropy.

Step 3: Backpropagation
Adjust weights using gradient descent until predictions match XOR truth table.

Eventually, the network learns the XOR function — something impossible for a simple linear model.




:


🧠. Key Terms (One-Liner Explanations)

  • Loss function: a score of how wrong the network is.

  • Cross-entropy loss: measures difference between predicted probability and actual label.

  • Gradient descent: learning by small corrective steps.

  • Backpropagation: sending error backward to update weights.

  • Activation function: rule that adds flexibility (non-linearity).



Final Thoughts

Neural networks may look intimidating with math, but they follow a simple cycle:
Predict → Compare (loss) → Correct (backpropagation) → Repeat.

Even complex AI models like GPT build upon these same foundations — just with millions (or billions!) of neurons.



No comments:

Post a Comment

🎯 Supervised Learning: How Machines Learn From Labeled Data

In Data Science and Machine Learning, one of the most fundamental concepts you will hear again and again is Supervised Learning . It’s the ...