Every time an app recognizes your face, a voice assistant understands your words, or an AI generates an image, a neural network is doing the work. They're the engine behind modern AI — and despite sounding biological, they're fundamentally just math.

Inspired by the Brain, Built with Math

Neural networks are loosely inspired by biological neurons. In the brain, neurons are cells that receive signals, process them, and fire signals to other neurons. Neural networks model this with layers of simple mathematical units — artificial neurons — that do something similar: receive numbers, multiply them by learned weights, apply a function, and pass the result forward.

The "inspired by the brain" framing is useful to get the intuition, but don't take it too literally. Real neural networks are much simpler than the brain and work quite differently at the detail level. What makes them powerful isn't the biological metaphor — it's what you can do when you layer these simple units together and train them on data.

The Basic Building Block: A Neuron

An artificial neuron takes a set of inputs, multiplies each one by a weight, sums the results, and passes the sum through an activation function that determines what to output.

output = activation(w₁x₁ + w₂x₂ + w₃x₃ + bias)

Where:

x₁, x₂, x₃ are inputs
w₁, w₂, w₃ are learned weights (how much to trust each input)
bias is a learned offset
activation is a function like ReLU or sigmoid that adds non-linearity

A single neuron can't do much. But combine hundreds of thousands of them in layers, and patterns emerge.

Layers: The Architecture of a Network

Neural networks are organized into layers:

Input Layer

The raw data enters here. For an image classifier, each pixel's value is a separate input. For a text model, each word (or token) is an input.

Hidden Layers

One or more layers between input and output. Each layer transforms the representation, extracting higher-level features:

Early layers in an image network might detect edges and textures
Middle layers might detect shapes and parts
Later layers might detect objects ("this looks like a cat's face")

The word "hidden" just means these layers aren't the input or output — they're doing intermediate work.

Output Layer

Produces the final result. For a classifier with 10 categories, the output layer has 10 neurons — each outputting a score for one category.

How Training Works

A neural network starts with random weights. Training adjusts those weights so the network's outputs get closer to the correct answers.

The process:

Forward pass — run the input through the network to get a prediction
Calculate loss — measure how far off the prediction is from the correct answer using a loss function
Backward pass (backpropagation) — calculate how much each weight contributed to the error
Update weights — adjust weights slightly in the direction that reduces the error (gradient descent)
Repeat — across millions or billions of examples until the loss plateaus

Backpropagation is the algorithm that makes this efficient. Instead of adjusting weights randomly and hoping things improve, it calculates the exact gradient — the direction and magnitude of adjustment for every weight — by propagating error information backward through the layers using the chain rule of calculus.

Types of Neural Networks

Different architectures are designed for different problems:

Feedforward Networks (MLPs)

The simplest kind — data flows in one direction, from input to output. Good for tabular data, classification, and regression tasks.

Convolutional Neural Networks (CNNs)

Designed for grid-structured data like images. Instead of connecting every neuron to every other, convolutional layers apply small filters that detect local patterns anywhere in the image. This is why CNNs are dramatically more efficient for images than plain feedforward networks.

Used in: image classification, object detection, medical imaging, face recognition.

Recurrent Neural Networks (RNNs)

Process sequential data by maintaining a "hidden state" that carries information from earlier timesteps. Once dominant for language tasks, largely replaced by transformers.

Transformers

The architecture behind modern large language models. Instead of processing sequences step by step, transformers use attention mechanisms to relate every part of the input to every other part simultaneously. This makes them parallelizable (fast to train) and capable of capturing long-range dependencies.

Used in: GPT, Claude, Gemini, BERT, and most state-of-the-art NLP and vision models.

Generative Adversarial Networks (GANs)

Two networks — a generator and a discriminator — trained in opposition. The generator creates fake data; the discriminator tries to tell real from fake. This adversarial dynamic produces high-quality generated outputs.

Used in: image synthesis, deepfakes, data augmentation.

Depth: Why "Deep Learning"?

Deep learning just means neural networks with many layers — "deep" refers to the depth of the network (many layers), not any philosophical meaning.

Shallow networks (1-2 hidden layers) can approximate simple functions. Deep networks (tens to hundreds of layers) can learn hierarchical representations — building up from simple features to complex patterns. This depth is what allows modern networks to tackle problems that seemed impossible a decade ago.

The Numbers Behind a Modern Network

For perspective:

A simple image classifier might have ~25 million parameters
GPT-2 (2019) had 1.5 billion parameters
Modern LLMs have hundreds of billions to over a trillion parameters
These are trained on trillions of tokens of text

Training large networks requires specialized hardware — primarily GPUs (Graphics Processing Units), which excel at the matrix multiplication operations that make up most of neural network computation.

What Neural Networks Can and Can't Do

What they're good at:

Finding patterns in high-dimensional data (images, text, audio)
Tasks where humans struggle to write explicit rules
Generalizing from examples to new inputs

What they struggle with:

Interpretability — it's hard to understand why a network made a decision
Data efficiency — they often need far more examples than humans to learn a concept
Robustness — small, carefully crafted changes to input can fool networks completely (adversarial examples)
Out-of-distribution generalization — they can fail badly on inputs that differ from training data

The Bottom Line

Neural networks are stacks of simple mathematical units that learn to transform data through repeated exposure to examples. The same core idea — neurons, layers, forward passes, backpropagation — scales from simple classifiers to the massive language models powering today's AI assistants. Understanding the basics helps demystify both the genuine power and the real limitations of modern AI systems.