Day 115: Feedforward Propagation and Layered Representation

The forward pass matters because it is the network's actual computation: the step-by-step process that turns raw inputs into hidden representations and finally into a prediction.

Today's "Aha!" Moment

Now that the perceptron and activation functions are on the table, the next question is obvious: what does a neural network actually do when you hand it one input example?

The answer is the forward pass. The network does not jump magically from input to prediction. It moves layer by layer. Each layer takes the current representation, applies a weighted transformation, applies an activation, and passes the result forward. What starts as raw input becomes a chain of increasingly task-shaped internal features.

This is the moment where "hidden layers" stop sounding mystical. They are not hidden because they are secret; they are hidden because they are intermediate. Their activations are the features the network invents for itself on the way to the final answer.

That is the aha. The forward pass is not implementation plumbing around the model. It is the model.

Why This Matters

The problem: Neural networks are often presented as large collections of parameters, which can hide the actual computation they perform.

Before:

Layers feel like stacked black boxes.
Matrix notation looks technical but not meaningful.
Hidden activations seem like arbitrary internal numbers.

After:

Each layer becomes a concrete transformation step.
Shapes and parameter matrices start to encode the architecture explicitly.
Hidden activations become understandable as learned intermediate features.

Real-world impact: Understanding the forward pass makes architecture design, debugging, and the next topic of backpropagation much easier to reason about.

Learning Objectives

By the end of this session, you will be able to:

Explain the forward pass through a multilayer network - Describe how linear transforms and activations compose from input to output.
Reason about layer shapes - Understand what matrix dimensions say about the architecture.
Interpret hidden layers as representation builders - Explain how intermediate activations can make the final decision easier.

Core Concepts Explained

Concept 1: Each Layer Performs a Transform, Then Hands a New Representation to the Next Layer

Take a small XOR-style network with two inputs, a hidden layer, and one output. The input layer is not doing intelligence by itself. It is just the starting representation.

The first hidden layer computes a weighted sum plus bias, applies an activation, and produces a new set of numbers. Those numbers are not yet the final answer. They are a rewritten version of the input, shaped for the next layer.

Z1 = W1 @ X + b1
A1 = relu(Z1)
Z2 = W2 @ A1 + b2
Y_hat = sigmoid(Z2)

That four-line fragment captures the whole idea. The network alternates between affine transformation and nonlinearity until it reaches the output.

input
  -> linear transform
  -> activation
  -> linear transform
  -> activation/output
  -> prediction

The trade-off is clarity versus abstraction. The forward pass is conceptually simple when read step by step, but deep networks can hide that simplicity under a lot of notation unless you keep this layer-by-layer view.

Concept 2: Matrix Shapes Are the Architecture Written in Algebra

Network diagrams are helpful, but the architecture is also encoded in tensor and matrix shapes.

If the input has 2 features and the hidden layer has 3 units, then W1 must connect 2 incoming values to 3 hidden units. If the output layer has 1 unit, then W2 must connect those 3 hidden activations down to 1 final score.

This is why shape reasoning matters so much. The dimensions tell you what can talk to what.

2 inputs  --W1-->  3 hidden units  --W2-->  1 output

In batch computation, the same transform is applied to many examples at once. That is one reason neural networks are efficient on modern hardware: the same layer operation can process whole groups of examples in parallel.

The trade-off is that matrix notation is compact and powerful, but it becomes confusing fast if you stop tracking what each dimension actually represents. Good shape reasoning is often the difference between understanding a network and memorizing formulas blindly.

Concept 3: Hidden Activations Are Learned Features, Not Random Internal Noise

This is the most important interpretive shift.

Suppose a network is learning XOR. One hidden unit might become sensitive to whether at least one input is on. Another might become sensitive to whether both are on. The output layer can then combine those hidden signals into the final XOR rule.

You do not need the exact units to look like that in every real model for the intuition to hold. The point is that hidden layers create internal features that later layers can use.

That is why multilayer networks can solve things a single perceptron cannot. The power is not just "more parameters." It is the ability to rewrite the input into a new representation where the final decision becomes easier.

raw input
   |
   +--> hidden representation 1
   |
   +--> hidden representation 2
   |
   +--> output decision

The trade-off is interpretability versus flexibility. The more the network invents its own internal representation, the more powerful it can become, but the less obvious it may be what each internal dimension means to a human reader.

Troubleshooting

Issue: Treating the forward pass as less important than backpropagation.

Why it happens / is confusing: Training algorithms often get more attention than inference mechanics.

Clarification / Fix: Backpropagation only exists to improve the forward pass. The forward pass is the actual function the network computes.

Issue: Thinking hidden layers are useful only because they add more weights.

Why it happens / is confusing: Parameter count is visible, so it seems like the main source of power.

Clarification / Fix: Hidden layers matter because they create new representations, not merely because they increase the number of coefficients.

Issue: Memorizing formulas without tracking dimensions.

Why it happens / is confusing: Matrix notation can look finished and self-contained.

Clarification / Fix: Always ask what each axis means. Shapes are not bookkeeping; they encode the architecture.

Advanced Connections

Connection 1: Forward Propagation ↔ Representation Learning

The parallel: Each hidden layer can be seen as a stage that converts raw inputs into features that are easier for later stages to use.

Real-world case: This is one of the biggest differences between neural networks and classical pipelines that depend heavily on manually engineered features.

Connection 2: Forward Propagation ↔ Backpropagation

The parallel: The backward pass will reuse the same chain of computations, but in reverse with gradients.

Real-world case: Understanding which values are produced and cached in the forward pass is exactly what makes gradient flow intelligible later.

Resources

Optional Deepening Resources

These resources are optional and are not required for the core 30-minute path.
[VIDEO] 3Blue1Brown - But what is a neural network?
- Link: https://www.youtube.com/watch?v=aircAruvnKk
- Focus: Visualize how a forward pass converts inputs into layered activations.
[TUTORIAL] CS231n Notes - Neural Networks Part 1
- Link: https://cs231n.github.io/neural-networks-1/
- Focus: Review forward-pass notation with attention to shapes and hidden representations.
[BOOK/TUTORIAL] Neural Networks and Deep Learning - Chapter 2
- Link: http://neuralnetworksanddeeplearning.com/chap2.html
- Focus: Read the forward/backward view of layered computation as preparation for training.
[BOOK] Deep Learning
- Link: https://www.deeplearningbook.org/
- Focus: Use the feedforward-network chapters as a more formal follow-up once the intuition is clear.

Key Insights

The forward pass is the network's real computation - It is the path from raw input to final prediction.
Matrix shapes are the architecture expressed algebraically - Dimensions tell you how layers connect and how batches flow through the network.
Hidden layers create learned intermediate features - Their value comes from representation building, not just from adding more parameters.

Knowledge Check (Test Questions)

What is the usual pattern inside one neural layer during the forward pass?
- A) Weighted sum plus bias, then activation.
- B) Activation first, then create the weights.
- C) Loss computation before the hidden representation.
Why do matrix dimensions matter so much in neural networks?
- A) Because they encode which layers connect to which and whether the computation is well-formed.
- B) Because larger matrices automatically mean better generalization.
- C) Because shapes are unrelated to the architecture.
What are hidden activations best understood as?
- A) Intermediate features learned by the network for later layers to use.
- B) Random internal numbers with no modeling role.
- C) Final class labels.

Answers

1. A: Each layer first computes an affine transformation and then applies its activation.

2. A: Shape consistency is what makes one layer's output a valid input for the next.

3. A: Hidden activations are the network's internal learned representation of the input.

← Back to Learning