Neural Nets? What are those?

1st November 2024

Index

  1. Neural Networks (You are here)
  2. Backpropagation
  3. Machine Learning
  4. MNIST Number Classification

Alright, it's 2024, AI and ML have been the biggest thing for the last few years. But what does it do? How do computers actually learn?

This is a multi-part series which will cover the basics of Neural Networks, AI and ML, along with code samples. Heavy credit to 3b1b and this book from which a lot of this information is gathered here.

Introduction

Neural networks are essentially the brain's digital twin. Just like neurons in our brain connect and communicate, these artificial networks use interconnected nodes that process information. When we feed data into these networks, they learn by adjusting their internal connections – recognizing patterns and making predictions.

Think of it like teaching a computer to see the world the way we do, one connection at a time. Let's break this down and understand some components related to forming a neural network.

  1. Neuron: A unit which holds a number. This number is called it's Activation. Denoted by ana_n.
  2. Weight: An Indication of how two neurons are connected. Denoted by wnw_n.
    • Positive indicates that the next Neuron should be on.
    • Negative indicates that the next Neuron should be off.

Now the weight of this neuron might be any number, but since we need a value between 0 and 1, we shall use some special functions. An example of which:

Sigmoid Function

sig(x)=σ(x)=11+ex sig(x) = \sigma (x) = \frac{1}{1+e^{-x}}

This function has some drawbacks:

  1. As input increases, the output tends to zero, which might potentially stop network learning.
  2. Due to its asymmetric output range (0 to 1) and computational complexity, the sigmoid function can cause unidirectional gradient flow and requires more expensive calculations.

To fix this, we usually use ReLU.

Rectified Linear Unit

Rectified Linear Unit (or ReLU in short) is a non-linear activation function. This function works great without major drawbacks. Represented as

f(x)=max(0,x)f(x) = max(0, x)

Alright, that's a lot of information to take in at once, but now let's look at how a model actually learns given the data.

How a model learns

Before that, we have to define what we mean by the weight of a neuron. For the sake of simplicity, let's stick to the Sigmoid function to squish our values.

Mathematically, we can define the weight as

Ws=σ(n=1Nwnanb)W_s = \sigma (\sum_{n=1}^{N} w_n a_n - b)

Where:

Now, the model learns from basically two steps:

  1. Your data is first split into a training and testing set.
  2. Then a specialized algorithm decides how to adjust your weights and biases according to your data's labels.

A model's accuracy is measured using Cost Function.

Cost Function

This function returns a mathematical value to quantify the error or difference in the model's output. The lower this is, the better the model is.

Let us assume a function

f(x)={1,x=n0,xn f(x) = \begin{cases} \text{1,} & x= n \\ \text{0,} &x \neq n \\ \end{cases}

Our cost function CC is defined as

C=x=kn(akf(k))2C = \sum_{x=k}^{n}(a_k - f(k))^2

Where aka_k is the activation of any kth^{th} neuron.

Now, our target should be to minimize this value. This is done so by finding the local minima. In a realistic space, a gradient descent is applied. Each step of our descent is going to look like ηΔc-\eta \Delta c, where η\eta is our learning rate. The bigger it is, the faster we'll approach our cost function.

Alright, this is a lot of information for one blog, In the next part of this series, we'll cover Backpropagation along with the math it covers. This is used to calculate the negative gradient to nudge your weights and biases to increase network efficiency.

← Previous
This is the first post

Next →
Backpropagation