The NumPy Playground
Python & NumPy for AI — From Raw Arrays to Vectorized Power
The sklearn.fit() Trap
Here is how a beginner typically "does machine learning":
One line. It works. And it costs you everything you need to know.
When you call normalize(X), scikit-learn quietly:
Checks your array's dtype and coerces it to float64
Computes the L2 norm along axis=1 using a C extension
Divides in-place using broadcasting across a reshaped norm vector
Handles the zero-norm edge case by clipping denominators
None of that is hidden because it's complicated — it's hidden because the
abstraction decided you don't need to see it. But every single one of those
steps is something that will break your custom model in a non-obvious way
the moment you step outside sklearn's guardrails.
This lesson makes those steps visible. You will implement them by hand,
watch them fail in predictable ways, and understand why the library made
each choice it did.
The Failure Mode
Here is the exact crash a beginner hits when trying to multiply two vectors:
No error. No warning. You get 32. If you expected a 3×3 outer product,
your entire downstream computation is silently wrong.
Or try this:
That actually works — but only because Python 3 / forces float division.
Now try it with explicit dtype:
Floor division on an integer array. No error. Your gradients just became
integers. Your model trains silently on garbage.
These are not edge cases. They are the default behavior of NumPy, and
every neural network implementation lives or dies by getting them right.
The ScratchAI Architecture
This lesson's module is the foundation layer of the entire ScratchAI
pipeline. Before you can write a forward pass, you need to reason fluently
about:
N-dimensional arrays: how NumPy lays them out in memory (row-major,
C-order by default), and why axis semantics matter for reductionsShape algebra: every operation in a neural network is a shape
transformation — understanding(batch, features) @ (features, hidden)
is the entire forward pass, conceptuallyBroadcasting: NumPy's mechanism for operating on arrays of different
shapes by virtually expanding dimensions — not copying datadtype contracts: your model's numerical precision is a dtype decision
made at array creation time
The data flow in this lesson:
No training loop yet. No loss function. Just the raw machinery that
everything else is built on.