Day1: NumPy Playground

Lesson 1 60 min

The NumPy Playground

Python & NumPy for AI — From Raw Arrays to Vectorized Power


The sklearn.fit() Trap

Component Architecture

ScratchAI-Beginner · Course 1 Pipeline Data Python lists CSV / sliders 01 NumPy Playground arrays · shapes broadcast · dtype Model Forward pass Activations Loss MSE / XEnt Scalar output Optimizer SGD / Adam Weight update Out Current lesson Future lessons

Here is how a beginner typically "does machine learning":

python
from sklearn.preprocessing import normalize
X_normalized = normalize(X)

One line. It works. And it costs you everything you need to know.

When you call normalize(X), scikit-learn quietly:

  1. Checks your array's dtype and coerces it to float64

  2. Computes the L2 norm along axis=1 using a C extension

  3. Divides in-place using broadcasting across a reshaped norm vector

  4. Handles the zero-norm edge case by clipping denominators

None of that is hidden because it's complicated — it's hidden because the
abstraction decided you don't need to see it. But every single one of those
steps is something that will break your custom model in a non-obvious way
the moment you step outside sklearn's guardrails.

This lesson makes those steps visible. You will implement them by hand,
watch them fail in predictable ways, and understand why the library made
each choice it did.


The Failure Mode

Flowchart

NumPy Playground — Data Flow Raw Input Python list / slider values shape: (n,) or (m, n) np.array(..., dtype=float64) N-D Array dtype enforced · shape annotated · C-contiguous reshape / newaxis / broadcast align Vectorized Operation X · W + b (batch, features) @ (features, hidden) + (hidden,) shape contract verified · no NaN/Inf check Result Array shape (batch, hidden) · dtype float64 dtype cast if needed · astype() Output Displayed + shape-annotated in UI Input Operation Output

Here is the exact crash a beginner hits when trying to multiply two vectors:

python
import numpy as np

a = np.array([1, 2, 3])      # shape: (3,)
b = np.array([4, 5, 6])      # shape: (3,)

result = a @ b               # gives: 32  ← scalar, not a matrix!

No error. No warning. You get 32. If you expected a 3×3 outer product,
your entire downstream computation is silently wrong.

Or try this:

python
A = np.array([[1, 2], [3, 4]])   # shape (2,2), dtype int64
B = A / 3                         # you expect [0.333, 0.666, ...]
print(B)
# array([[0.33333333, 0.66666667],
#        [1.        , 1.33333333]])

That actually works — but only because Python 3 / forces float division.
Now try it with explicit dtype:

python
A = np.array([[1, 2], [3, 4]], dtype=np.int32)
B = A // 3
print(B)
# array([[0, 0],
#        [1, 1]], dtype=int32)

Floor division on an integer array. No error. Your gradients just became
integers. Your model trains silently on garbage.

These are not edge cases. They are the default behavior of NumPy, and
every neural network implementation lives or dies by getting them right.


The ScratchAI Architecture

State Machine

Array Lifecycle — State Machine Initialized np.array() / zeros / randn Shaped reshape / newaxis Operated @ · + broadcast Cast astype() · precision? ✓ Converged no NaN · bounded ✗ Error NaN · overflow · shape mismatch ✗ Diverged Inf / loss explodes reshape() matmul / add valid shape astype() lossless overflow / truncate shape mismatch int div truncation fix dtype / shape → retry Success Failure Neutral

This lesson's module is the foundation layer of the entire ScratchAI
pipeline. Before you can write a forward pass, you need to reason fluently
about:

  • N-dimensional arrays: how NumPy lays them out in memory (row-major,
    C-order by default), and why axis semantics matter for reductions

  • Shape algebra: every operation in a neural network is a shape
    transformation — understanding (batch, features) @ (features, hidden)
    is the entire forward pass, conceptually

  • Broadcasting: NumPy's mechanism for operating on arrays of different
    shapes by virtually expanding dimensions — not copying data

  • dtype contracts: your model's numerical precision is a dtype decision
    made at array creation time

The data flow in this lesson:

Code
Raw Input (Python lists / sliders)
    ↓  np.array(..., dtype=np.float64)
N-D Array [shape annotated]
    ↓  reshape / broadcast
Aligned Operands
    ↓  vectorized operation (dot, matmul, norm, outer)
Result Array [shape verified]
    ↓  dtype cast if needed
Output (displayed + explained)

No training loop yet. No loss function. Just the raw machinery that
everything else is built on.


Need help?