Interactive Neural Network: Draw → Process → Recognize
$$\text{Architecture: } 784 \text{ (28×28 pixels)} \rightarrow 128 \rightarrow 64 \rightarrow 32 \rightarrow 10 \text{ (digits 0-9)}$$
$$\text{Total Parameters: } 784 \times 128 + 128 + 128 \times 64 + 64 + 64 \times 32 + 32 + 32 \times 10 + 10 = 111,018$$
Neuron Activation & Common Activation Functions
$$\text{Neuron Output: } a = f(w_1x_1 + w_2x_2 + ... + w_nx_n + b)$$
$$\text{where } f \text{ is the activation function}$$
Interactive Neuron Visualization
ReLU
Leaky ReLU
Sigmoid
Tanh
Softmax
Active Function
Activation Function Comparison
Focus on Individual Neurons: How Each One Learns Features
$$\text{Select a specific neuron to see what pattern it has learned to detect}$$
Current Layer Info:
Layer 1: Each neuron detects specific edge patterns
Click on any neuron below to see what it has learned
Selected Neuron Stats:
Select a neuron to see its properties
Activation: --
Weight Stats: --
Layer 1: Edge Detection (128 neurons)
Click on any neuron to see what pattern it detects:
Select a neuron above
This shows what pattern the selected neuron responds to
Current Input Response:
How strongly this neuron fires for the current input
Mathematical Foundations: Backpropagation & Gradient Descent
$$\text{Forward Pass: } \mathbf{a}^{(l+1)} = f(W^{(l+1)} \mathbf{a}^{(l)} + \mathbf{b}^{(l+1)})$$
$$\text{Backward Pass: } \frac{\partial C}{\partial W^{(l)}} = \mathbf{a}^{(l-1)} (\boldsymbol{\delta}^{(l)})^T, \quad \frac{\partial C}{\partial \mathbf{b}^{(l)}} = \boldsymbol{\delta}^{(l)}$$
Complete Backpropagation Process
$$\text{1. Cost Function: } C = \frac{1}{2}\sum_{i=1}^{10}(y_i - a_i^{(L)})^2$$
$$\text{2. Output Layer Error: } \boldsymbol{\delta}^{(L)} = \nabla_a C \odot f'(\mathbf{z}^{(L)})$$
$$\text{3. Hidden Layer Error: } \boldsymbol{\delta}^{(l)} = ((W^{(l+1)})^T \boldsymbol{\delta}^{(l+1)}) \odot f'(\mathbf{z}^{(l)})$$
$$\text{4. Weight Update: } W^{(l)} \leftarrow W^{(l)} - \eta \frac{\partial C}{\partial W^{(l)}}$$
$$\text{5. Bias Update: } \mathbf{b}^{(l)} \leftarrow \mathbf{b}^{(l)} - \eta \frac{\partial C}{\partial \mathbf{b}^{(l)}}$$
0.01
3
8
Training Progress:
Ready to start training
Cost: 0.500
Gradient Norm: 0.000
Iteration: 0
Accuracy: 0.0%
Numerical Example:
Forward:
z = Wx + b = 0.000
a = σ(z) = 0.000
Backward:
δ = (a - y) = 0.000
∂C/∂w = δ × x = 0.000
w_new = w - η∂C/∂w = 0.000
Cost Function Landscape
Optimization Path
Interactive Backpropagation Network with Live Values