Day2:Tensor Shape Visualizer

Lesson 2 60 min

Data as Tensors: The Shape Visualizer

The `sklearn.fit()` Trap

Component Architecture

Here's how most beginner tutorials handle data:

python

from sklearn.preprocessing import StandardScaler
import pandas as pd

df = pd.read_csv("data.csv")
X = df.values          # shape? who knows
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

This works. It runs. You get numbers back. And it teaches you almost nothing.

df.values silently hands you a 2D NumPy array. fit_transform silently
reshapes, centers, and scales it. If your CSV has 500 rows and 12 columns,
you get a (500, 12) array — but you never had to think about what that
shape means. You never had to ask: which axis is the batch dimension? What
does axis 1 represent? What happens if I reduce along axis 0 vs axis 1?

The abstraction hides the most important question in all of deep learning:
"What does this array's shape actually mean?"

Every neural network bug you will ever debug — wrong matrix multiply,
misaligned batch dimension, exploding gradient from a transposed weight
matrix — traces back to a shape misunderstanding. Pandas and sklearn make
it easy to never develop that intuition. We are going to develop it now,
from first principles.

The Failure Mode

Flowchart

Let's say you load a grayscale image as a flat list and try to reshape it
into a grid for display:

python

import numpy as np

pixels = np.arange(13)          # 13 elements — intentionally wrong
grid = pixels.reshape(3, 4)     # 3 × 4 = 12 ≠ 13

Code

ValueError: cannot reshape array of size 13 into shape (3,4)

NumPy is being helpful here. This error is loud. But the silent failure
is worse: reshaping (500, 12) to (12, 500) doesn't raise an error —
the element count is conserved — but now every row operation you thought
was iterating over samples is iterating over features. Your model
trains on transposed data. Loss goes down (because neural networks are
disturbingly good at fitting garbage), and you ship a model that fails
in production.

The rule: **reshape only commutes with your intent if you understand what
each axis represents before the reshape.**

This lesson builds the muscle memory to always know your shapes.

The ScratchAI Architecture

State Machine

We are building a TensorAnalyzer — a pure-function module in model.py
that accepts any NumPy array and returns a complete structural description:
rank, shape, strides, memory footprint, valid reshape candidates, axis
statistics, and indexed slices.

The data flow is deliberately simple because the lesson is about structure,
not computation:

Code

Raw File (CSV / image bytes)
        ↓  np.loadtxt / np.frombuffer
   ndarray  (rank 1–4)
        ↓  infer_rank() + compute_shape_stats()
   Structural Metadata  (shape, strides, axes)
        ↓  extract_slice() / compute_axis_stats()
   2D Cross-Sections + Axis Reductions
        ↓  validate_reshape() + generate_reshape_candidates()
   Reshape Map  (all valid (r,c) pairs)
        ↓  Plotly heatmaps + bar charts
   Streamlit Dashboard

No training loop. No weights. No gradients. Just arrays and their geometry.
This is intentional: before you can train anything, you must be able to
read the shape of data the way a doctor reads an X-ray — immediately,
fluently, without calculation.

Learning Objectives

✓ Students will be able to inspect any NumPy array's rank, shape, strides, and axis semantics — and correctly apply reshape, slicing, and vectorized reductions — without relying on pandas or sklearn abstractions.

💬 Discuss this topic