| COURSE 2 OF 3 DEEP LEARNING IN PRACTICE Intermediate Edition โ PyTorch ยท Computer Vision ยท NLP ยท Generative AI ๐ต Intermediate Level ๐ฆ 30 Lessons Hands-On Apps ๐ฅ PyTorch Primary Framework ๐ฅ๏ธ CPU-First No GPU Required |
|---|
| Graduate from NumPy to PyTorch, from toy networks to real models. Thirty lessons. Thirty apps. Computer vision, NLP, generative models, and production-ready training โ all running on any laptop. |
|---|
Why This Course?
Part 1 gave you the foundations. This course gives you the firepower. The jump from NumPy to PyTorch is not just a syntax change โ it is an architectural shift in how you think about building AI systems.
Most intermediate AI courses spend the first third reviewing basics, then jump to fine-tuning pre-trained models without explaining what is actually happening inside them. This course treats you differently: by Lesson 3, you are building custom nn.Module architectures from scratch. By Lesson 8, you have fine-tuned a production computer vision model on your own data. By Lesson 30, you have a fully served image recognition system โ ONNX-exported, REST-accessible, and monitored.
The real engineering in AI happens not in model selection but in the decisions surrounding it: how you structure your data pipeline, what your training loop does when NaN appears, how you tune without grid-searching blindly, and how you profile a model to find the actual bottleneck. This course makes those decisions explicit and visible โ one Streamlit app at a time.
| The engineers who move fastest in production AI are not the ones who know the most models โ they are the ones who understand the PyTorch execution model well enough to fix things when they break, and to optimize when they are slow. โ Core design principle of this course | |
|---|
| 1 | PyTorch becomes transparent, not a black box You build every core component โ Dataset, DataLoader, training loop, loss, optimizer step โ so autograd has no mystery. You know exactly what .backward() is computing and why. | |
|---|
| 2 | You work across four real domains of deep learning Computer vision, NLP, generative models, and audio โ not as survey topics but as hands-on systems you build, train, debug, and ship across 25 lessons. | |
| 3 | Training science becomes a first-class skill BatchNorm, Dropout, LR schedulers, mixed precision, optimizer selection โ the decisions that determine whether a model trains or collapses, taught with live experiments you run yourself. | |
| 4 | You build 30 portfolio-grade Streamlit applications Every lesson ships a working app. CIFAR classifier, fine-tuned transfer model, GAN, VAE, audio classifier, SHAP explainer, Optuna tuner, ONNX exporter โ 30 items that demonstrate real depth. | |
| 5 | Production habits are enforced from Lesson 1 Gradient clipping, NaN detection, checkpoint strategy, reproducible seeds, profiling โ the practices that distinguish amateur training runs from professional ones. | |
What You Will Build
30 Streamlit applications across computer vision, NLP, generative AI, audio, and production serving.
| App | What It Demonstrates | |
|---|
| Tensor Ops Lab | PyTorch autograd visualizer โ gradients at every graph node | |
| Net Architect | nn.Module composer โ generates code, FLOPs, and memory estimate | |
| CIFAR Classifier | CNN trained on CIFAR-10 with live filter and activation map viz | |
| Transfer Learning Studio | Fine-tune MobileNetV2 on your own 50-image dataset | |
| Object Detector | YOLOv5-nano bounding box app with live IoU slider | |
| Mood Reader | LSTM sentiment classifier with token-level attention heatmap | |
| Word2Vec Explorer | 3D embedding space with analogy solver and nearest-neighbor search | |
| Seq2Seq Translator | Encoder-decoder with Bahdanau attention heatmap visualization | |
| Optimizer Arena | 5 optimizers trained simultaneously โ loss curve race chart | |
| Latent Space Explorer | VAE MNIST with 2D latent slider generating new digit images | |
| GAN Painter | Mini GAN with live generator output grid evolving per epoch | |
| Model Explainer | SHAP waterfall + LIME explanation for any classifier prediction | |
| Auto-Tuner | Optuna HP search with parallel coordinates plot | |
| Fast Trainer | FP32 vs FP16 AMP race: speed, memory, accuracy side-by-side | |
| Vision Pipeline (Capstone) | Train โ ONNX export โ REST serve โ monitor โ end to end | |
Every app is delivered in the same four-file production structure:
| lesson_XX/ โโโ app.py # Streamlit UI โ launch with: streamlit run app.py โโโ model.py # PyTorch model โ nn.Module implementation โโโ train.py # Training script with logging, checkpointing, grad clipping โโโ README.md # Run it ยท Break it ยท Extend it ยท Challenge |
|---|
Who Should Take This Course?
Anyone who completed Course 1, or who already understands basic ML concepts and wants to move into real deep learning engineering.
This course was designed with the same broad professional audience as Course 1 โ because deep learning is no longer confined to research labs. It powers every product decision, every infrastructure investment, and every user experience that involves intelligence. Here is who gets what from this course:
| Audience | What This Course Gives You | |
|---|
| Software Engineers / Developers | Build, integrate, and debug PyTorch models with the same rigour you bring to any production system. | |
| Software Architects & Designers | Understand the computational and memory constraints of different architectures to make informed system design decisions. | |
| Data Engineers | Build production data pipelines for deep learning systems โ DataLoaders, augmentation, preprocessing โ without guessing. | |
| QA / SRE Engineers | Know what a healthy training run looks like, how to detect silent model failures, and how to write model integration tests. | |
| DevOps Engineers | Understand what you are containerising and serving: ONNX export, model size, batch inference, and serving tradeoffs. | |
| Product Managers | Evaluate feasibility of deep learning features with the depth to catch unrealistic promises or missed opportunities. | |
| Engineering Managers | Review deep learning work with enough understanding to identify shortcuts, evaluate quality, and set technical direction. | |
| UI/UX Designers | Understand model confidence, attention maps, and hallucination patterns that directly shape how AI-powered interfaces should behave. | |
| Technical Writers & Consultants | Document and advise on deep learning systems with the credibility that comes from having trained and shipped models yourself. | |
What Makes This Course Different?
Six things you will not find combined in any other intermediate deep learning course.
| 1 | You train on your own data from Lesson 8 Most courses use the same five benchmark datasets from start to finish. By Lesson 8, you upload your own images and fine-tune a model on them. That gap between benchmark and real data is where most models fail โ this course bridges it early. | |
|---|
| 2 | Training science is a dedicated section, not a footnote BatchNorm, Dropout, LR scheduling, optimizer choice, and custom loss functions each get a full lesson with live comparison experiments. These decisions change model performance by 15-40% โ but most courses mention them in passing. | |
| 3 | Four domains, not one Computer vision, NLP, generative models, and audio are not separate tracks โ they are sequential sections. You accumulate skills across domains, and by the end you recognise shared patterns (loss functions, normalisation, attention) that cut across all of them. | |
| 4 | Explainability is built in, not added on Lesson 26 is a full SHAP + LIME lesson before the capstone. Too many engineers ship models they cannot explain. This course makes interpretability a technical skill, not a compliance checkbox. | |
| 5 | The capstone is a real serving pipeline Lesson 30 does not end at model.eval(). It exports to ONNX, serves predictions via a REST endpoint, and demonstrates the latency difference between PyTorch and ONNX runtime. The output is something you can deploy, not just a saved .pt file. | |
| 6 | CPU-first, GPU-optional throughout Every lesson is optimised to run on a 4GB RAM CPU laptop using efficient architectures (MobileNetV2, YOLOv5-nano, quantised models) and small datasets. GPU use is marked as optional and accelerates training but is never required. | |
Key Topics Covered
Six domains, each explored through working applications rather than slides.
| PyTorch Engine | โธ Autograd computation graph internals โธ nn.Module: parameters, buffers, hooks โธ DataLoader: collate, pin_memory, prefetch โธ Production training loop anatomy โธ Gradient clipping, NaN detection, seeding | |
|---|
| Computer Vision | โธ CNN architecture: conv, pool, BN, ReLU โธ Receptive field and feature hierarchy โธ Transfer learning: freeze/unfreeze strategy โธ Object detection: anchor boxes, IoU, NMS โธ Semantic segmentation: FCN encoder-decoder | |
|---|
| Sequence & NLP | โธ RNN hidden state and BPTT truncation โธ LSTM gates: forget, input, output, cell โธ Word2Vec: skip-gram, negative sampling โธ Encoder-decoder with Bahdanau attention โธ TF-IDF vs deep text representations | |
|---|
| Training Science | โธ BatchNorm: running stats, train vs eval โธ Dropout: Bayesian interpretation, scaling โธ LR schedulers: cosine, OneCycle, SGDR โธ Adam vs AdamW vs SGD vs Lion comparison โธ Custom loss: focal, label smoothing, contrastive | |
|---|
| Generative Models | โธ Autoencoder bottleneck and denoising โธ VAE: reparameterization trick and ELBO โธ GAN: Nash equilibrium and mode collapse โธ SimCLR: NT-Xent loss, augmentation pairs โธ Audio: MFCC, mel spectrogram, SpecAugment | |
|---|
| Production & XAI | โธ SHAP: Shapley values and TreeSHAP โธ LIME: local approximation for any model โธ Optuna: Bayesian HP optimization + pruning โธ Mixed precision AMP, loss scaling, grad checkpointing โธ ONNX export and onnxruntime inference serving | |
|---|
Prerequisites
This is an intermediate course. It builds directly on Course 1 foundations โ or equivalent real-world knowledge.
You do not need to have taken Course 1 if you already understand the concepts it covers. The table below specifies exactly what is required โ and what is explicitly not required โ so you can assess your own readiness honestly.
| Area | What You Need | Level Required | |
|---|
| AI / ML Foundations | Forward pass, backpropagation, loss functions, gradient descent โ conceptually and in code | Solid | |
| Python Programming | Functions, classes, decorators, list comprehensions, file I/O, pip, virtual envs | Solid | |
| NumPy | Array operations, broadcasting, vectorization โ the level covered in Course 1 | Comfortable | |
| Basic Statistics | Mean, variance, distributions, train/val/test splits, evaluation metrics (F1, AUC) | Comfortable | |
| PyTorch | No prior PyTorch required โ this course starts from Tensor creation in Lesson 1 | None required | |
| Computer Vision / NLP | No prior CV or NLP required โ all domain knowledge is built from within the lessons | None required | |
| Cloud / GPU | No cloud account or GPU required โ all lessons run on CPU, 4-8GB RAM laptop | None required | |
| Advanced Math | Linear algebra and calculus at the level of Course 1 Lessons 1-5 is sufficient | Course 1 level | |
| The test for readiness: can you implement gradient descent from scratch in NumPy and explain what the chain rule does? If yes, you are ready for this course. If not, Course 1 is the right starting point. โ Readiness check | |
|---|
Learning Outcomes
Twelve concrete abilities you will demonstrate by the final lesson.
| โ Build a custom PyTorch Dataset, DataLoader, and training loop from a blank file โ Architect a CNN and read its feature maps to diagnose what each filter has learned โ Fine-tune a pre-trained model on your own data and control catastrophic forgetting โ Implement LSTM sequence classification with packed sequences and attention โ Train a VAE and navigate its latent space to generate new samples โ Build a GAN training loop that detects and recovers from mode collapse | โ Choose the right optimizer and LR schedule for a given training profile โ Implement a custom differentiable loss function in PyTorch autograd โ Explain any model prediction using SHAP waterfall charts and LIME approximations โ Run Optuna HP search with pruning and interpret the parallel coordinates plot โ Profile a model, identify the bottleneck layer, and apply a targeted optimisation โ Export a trained model to ONNX and serve it from a REST endpoint | |
|---|
Course Structure
Six sections forming an end-to-end deep learning engineering curriculum.
| โ๏ธ Section 1 PyTorch Engine (L1-5) | ๐๏ธ Section 2 Computer Vision (L6-10) | ๐ Section 3 Seq & NLP (L11-15) | ๐ฌ Section 4 Training Science (L16-20) | ๐จ Section 5 Generative AI (L21-25) | ๐ Section 6 Ship & Explain (L26-30) | |
|---|
| 01 | PyTorch Fundamentals Lessons 1โ5 ยท Master the PyTorch engine โ tensors, autograd, nn.Module, DataLoaders, and production training loops โ before writing a single model. | |
|---|
| 02 | Computer Vision Lessons 6โ10 ยท Build CNNs, fine-tune pre-trained models, detect objects, and segment images โ with every decision visualised interactively. | |
|---|
| 03 | Sequence Models & NLP Lessons 11โ15 ยท Tackle time series, sentiment, text classification, translation, and embeddings using RNNs, LSTMs, and encoder-decoders. | |
|---|
| 04 | Training Science Lessons 16โ20 ยท Master the decisions that determine whether training converges or fails: normalization, regularization, schedulers, optimizers, and custom losses. | |
|---|
| 05 | Generative Models Lessons 21โ25 ยท Build autoencoders, VAEs, GANs, audio classifiers, and a self-supervised contrastive learner โ understanding what it means to generate, not just classify. | |
|---|
| 06 | Evaluation, XAI & Shipping Lessons 26โ30 ยท Explain predictions, tune hyperparameters, train efficiently with mixed precision, profile bottlenecks, and ship a production-ready vision system. | |
|---|
Full Curriculum โ All 30 Lessons
Every lesson is a standalone Streamlit app. Each row is a deliverable.
| Section 01: PyTorch Fundamentals ยท Lessons 1โ5 |
|---|
| # | App Name | What You Build | Concepts Mastered | Tools | |
|---|
| 1 | Tensor Ops Lab PyTorch Fundamentals & Autograd | A live tensor workbench: create tensors, perform ops, trigger .backward() and inspect gradient values at every node on screen | Autograd computation graph, leaf tensors, in-place ops, gradient accumulation, detach vs no_grad | PyTorch, Torchviz, Streamlit | |
| 2 | Gradient Tracer Custom Autograd Under the Hood | Define a composite math function, run backward(), compare numerical Jacobian vs PyTorch's analytical gradient โ side by side | Jacobian matrix, finite-difference approximation, retain_graph, higher-order gradients | PyTorch, Plotly, Streamlit | |
| 3 | Net Architect Building Models with nn.Module | Compose any layer stack via UI โ app generates the nn.Module code, counts parameters, estimates FLOPs and memory footprint | Sequential, ModuleList, ModuleDict, parameter sharing, forward() override, parameter vs buffer | PyTorch, torchinfo, Streamlit | |
| 4 | Data Pipeline Builder Custom Datasets & DataLoaders | Upload a folder of images or a CSV โ builds Dataset class, benchmarks batching strategies, shows prefetch vs no-prefetch timing | getitem/len, collatefn, numworkers, pinmemory, droplast, sampler strategies | PyTorch, PIL, Streamlit | |
| 5 | Training Loop Lab Production Training Loop Anatomy | A configurable training loop with live loss curve, gradient norm tracker, LR scheduler stepper, and checkpoint saver | Gradient clipping, nan detection, loss scaling, checkpoint strategy, reproducible seeds | PyTorch, Plotly, Streamlit | |
| Section 02: Computer Vision ยท Lessons 6โ10 |
|---|
| # | App Name | What You Build | Concepts Mastered | Tools | |
|---|
| 6 | CIFAR Classifier Convolutional Neural Networks | Train a CNN on CIFAR-10 โ live loss/accuracy chart, conv filter visualization, and activation map per layer per image | Conv2D, stride, padding, receptive field, MaxPool, spatial hierarchy, parameter sharing | PyTorch, torchvision, Matplotlib, Streamlit | |
| 7 | Filter Inspector Convolution Mechanics Unpacked | Upload any image โ apply custom conv kernels (edge, blur, sharpen) โ show output feature maps side-by-side with kernel weights | Kernel sliding, feature map dimension formula, dilation, depthwise/pointwise convolution | PyTorch, PIL, Plotly, Streamlit | |
| 8 | Transfer Learning Studio Transfer Learning & Fine-Tuning | Upload 50โ200 custom images โ fine-tune MobileNetV2 โ shows frozen vs trainable layers, accuracy per epoch, confusion matrix | Pre-trained weight reuse, feature extraction mode, layer unfreezing strategy, catastrophic forgetting | PyTorch, torchvision, Streamlit | |
| 9 | Object Detector Object Detection: Anchors & YOLO | Upload any image โ run YOLOv5-nano โ bounding boxes rendered with labels; IoU slider filters detections in real time | Anchor boxes, IoU, NMS, objectness score, regression + classification joint heads, COCO labels | YOLOv5, OpenCV, Streamlit | |
| 10 | Pixel Classifier Semantic Segmentation Internals | Upload image โ lightweight DeepLabV3 segments pixels by class โ color-coded mask overlay with per-class confidence bars | FCN, encoder-decoder, skip connections, upsampling vs transposed conv, pixel-wise cross-entropy | PyTorch, torchvision, OpenCV, Streamlit | |
| Section 03: Sequence Models & NLP ยท Lessons 11โ15 |
|---|
| # | App Name | What You Build | Concepts Mastered | Tools | |
|---|
| 11 | Time Series RNN Recurrent Neural Networks from First Principles | Upload a time series CSV โ RNN predicts next N steps, shows hidden state vector evolution epoch by epoch | Vanishing gradient in RNNs, BPTT, hidden state reuse, sequence padding, teacher forcing | PyTorch, Plotly, Streamlit | |
| 12 | Mood Reader LSTMs for Sentiment Analysis | Paste any text โ LSTM classifies positive/neutral/negative with a token-level attention heatmap overlay on the input text | LSTM gates (forget/input/output/cell), gradient highway, bidirectional LSTM, packed sequences | PyTorch, NLTK, Plotly, Streamlit | |
| 13 | Word2Vec Explorer Word Embeddings in Practice | Train Word2Vec on a custom corpus โ nearest-neighbor explorer, analogy solver, and 3D PCA embedding projection | Skip-gram, CBOW, negative sampling, embedding geometry, cosine similarity, OOV handling | Gensim, Plotly (3D), Streamlit | |
| 14 | News Classifier Text Classification with TF-IDF + Deep | Paste a headline โ compare TF-IDF+LogReg vs LSTM vs fine-tuned DistilBERT: side-by-side accuracy and confidence scores | TF-IDF vectorization, tokenization, sequence length impact, model complexity vs accuracy tradeoff | Scikit-learn, PyTorch, HuggingFace, Streamlit | |
| 15 | Seq2Seq Translator Encoder-Decoder Architecture | Train a mini character-level encoder-decoder on a toy translation task โ attention heatmap shows which input tokens matter | Encoder hidden state, context vector, decoder teacher forcing, Bahdanau attention, beam search basics | PyTorch, Plotly, Streamlit | |
| Section 04: Training Science ยท Lessons 16โ20 |
|---|
| # | App Name | What You Build | Concepts Mastered | Tools | |
|---|
| 16 | Training Stabilizer Batch Normalization Mechanics | Train a network with/without BatchNorm โ overlaid loss curves, weight distribution histograms, covariate shift visualization | Internal covariate shift, running mean/variance, affine transform, train vs eval mode difference | PyTorch, Plotly, Streamlit | |
| 17 | Overfit Fixer Dropout, DropConnect & Stochastic Depth | Inject Dropout at different rates โ compare train/val accuracy divergence, visualize ensemble effect on confidence | Dropout as Bayesian approximation, inference-time scaling, DropConnect, stochastic depth rationale | PyTorch, Plotly, Streamlit | |
| 18 | LR Scheduler Lab Learning Rate Scheduling Science | Pick scheduler (StepLR, CosineAnnealing, OneCycleLR, ReduceLROnPlateau) โ LR curve and training loss animated side-by-side | Warm-up, cosine annealing, cyclical LR, SGDR, LR finder algorithm, plateau detection | PyTorch, Plotly, Streamlit | |
| 19 | Optimizer Arena SGD vs Adam vs AdamW vs Lion | Same network trained with 5 optimizers simultaneously โ overlaid loss curves, weight update magnitude histograms per optimizer | Momentum, adaptive learning rates, weight decay decoupling, Lion optimizer update rule, EMA | PyTorch, Plotly, Streamlit | |
| 20 | Loss Designer Custom Loss Functions in PyTorch | Formula editor for loss functions โ trains model with custom loss, compares gradient landscape vs MSE/CE, plots loss surface | Custom autograd Function, differentiability requirements, focal loss, label smoothing, contrastive loss | PyTorch, Plotly, Streamlit | |
| Section 05: Generative Models ยท Lessons 21โ25 |
|---|
| # | App Name | What You Build | Concepts Mastered | Tools | |
|---|
| 21 | Image Compressor Autoencoders โ Compression & Denoising | Upload image โ encoder compresses to latent vector, decoder reconstructs โ compression ratio, SSIM score, noise injection demo | Bottleneck layer, reconstruction loss, undercomplete AE, denoising AE, latent space geometry | PyTorch, PIL, Plotly, Streamlit | |
| 22 | Latent Space Explorer Variational Autoencoders | Train VAE on MNIST โ 2D latent space slider generates new digits, interpolation path between two digits animated | KL divergence, reparameterization trick, ELBO, posterior collapse, disentanglement | PyTorch, Plotly, Streamlit | |
| 23 | GAN Painter Generative Adversarial Networks | Train mini-GAN on simple 2D distributions or MNIST โ D/G loss curves live, generator output grid evolves epoch by epoch | Nash equilibrium, mode collapse detection, Wasserstein distance, gradient penalty, training instability | PyTorch, Plotly, Streamlit | |
| 24 | Sound Classifier Audio Deep Learning | Record/upload audio โ extract MFCC + spectrogram โ 1D CNN classifies: speech/music/noise/nature with waveform visualization | MFCC, mel spectrogram, 1D convolution, audio augmentation (SpecAugment), temporal modeling | Librosa, PyTorch, Streamlit | |
| 25 | Self-Supervised Trainer Contrastive Learning โ SimCLR | Upload image dataset โ trains SSL encoder without labels โ UMAP of learned embeddings shows semantic clustering emerge | NT-Xent loss, augmentation pairs, projector head, representation quality metrics, linear probe eval | PyTorch, UMAP, Plotly, Streamlit | |
| Section 06: Evaluation, XAI & Shipping ยท Lessons 26โ30 |
|---|
| # | App Name | What You Build | Concepts Mastered | Tools | |
|---|
| 26 | Model Explainer Interpretability: SHAP & LIME | Train a classifier โ click any prediction โ SHAP waterfall chart + LIME explanation show feature contribution per decision | Shapley values (game theory foundation), LIME local approximation, global vs local XAI, TreeSHAP | SHAP, LIME, Plotly, Streamlit | |
| 27 | Auto-Tuner Hyperparameter Optimization with Optuna | Define search space via UI โ Optuna runs trials โ parallel coordinates plot of HP vs accuracy, best config summary card | Bayesian optimization, TPE sampler, pruning (Hyperband), search space design, early stopping integration | Optuna, Plotly, Streamlit | |
| 28 | Fast Trainer Mixed Precision & Efficient Training | Train same model in FP32 vs FP16 AMP โ compare wall-clock time, memory usage, accuracy; loss scale tracker live | FP16 numerical range, loss scaling, AMP context manager, memory bandwidth bottleneck, gradient checkpointing | PyTorch AMP, psutil, Streamlit | |
| 29 | Performance Profiler Model Benchmarking & Profiling | Load any model โ PyTorch Profiler generates flame chart, per-operator timing, memory timeline, FLOP breakdown | Inference latency vs throughput, bottleneck layer identification, operator fusion, profiler overhead | PyTorch Profiler, torchinfo, Streamlit | |
| 30 | Vision Pipeline CAPSTONE โ Full Image Recognition System | Upload image folder โ auto-label โ fine-tune CNN โ evaluate โ export to ONNX โ serve predictions via REST endpoint in UI | Complete ML lifecycle: data โ train โ evaluate โ export โ serve โ monitor โ production-ready | PyTorch, ONNX, onnxruntime, FastAPI, Streamlit | |
Lesson-Level Learning Objectives
One measurable outcome per lesson โ you know exactly what you shipped and why it matters.
| Lesson | By the end of this lesson you canโฆ | |
|---|
| L01 | Trigger .backward() on a custom expression, read gradient values at every node, and explain what each gradient means geometrically | |
| L02 | Compute the Jacobian of a composite function numerically and analytically, confirm they match within floating-point tolerance | |
| L03 | Build a 5-layer nn.Module from scratch, count parameters by layer, and estimate its inference memory footprint before running it | |
| L04 | Benchmark three DataLoader configurations (different numworkers and pinmemory settings) and select the fastest for a given dataset size | |
| L05 | Write a training loop that handles gradient clipping, NaN detection, checkpointing, and LR scheduling โ all from a blank file | |
| L06 | Train a CNN on CIFAR-10, read its conv filter visualizations, and predict which layer is responsible for edge detection vs texture detection | |
| L07 | Design a custom 3x3 kernel, apply it to an image, and predict the feature map dimensions before running the convolution | |
| L08 | Fine-tune MobileNetV2 on a 100-image custom dataset, compare frozen vs unfrozen accuracy, and diagnose any catastrophic forgetting | |
| L09 | Run YOLOv5-nano on a custom image, tune the IoU threshold to eliminate false positives, and explain the NMS algorithm step by step | |
| L10 | Apply DeepLabV3 to an image, identify which classes are being confused, and propose a data augmentation strategy to address it | |
| L11 | Train an RNN on a time series, identify where gradient vanishing occurs by inspecting the gradient norm history, and quantify prediction error | |
| L12 | Build a bidirectional LSTM sentiment classifier, interpret the attention heatmap, and explain which tokens the model weighted most heavily | |
| L13 | Train Word2Vec on a custom corpus, solve three analogies correctly, and visualise the embedding space coloured by semantic category | |
| L14 | Compare TF-IDF+LogReg vs LSTM vs DistilBERT on the same classification task and justify which to deploy given a latency budget | |
| L15 | Build an encoder-decoder with attention, run a forward pass manually, and read the attention heatmap to verify alignment is correct | |
| L16 | Train a network with and without BatchNorm, explain the covariate shift difference from weight histograms, and diagnose train vs eval mode bugs | |
| L17 | Apply Dropout at three different rates, quantify the accuracy-confidence tradeoff, and explain why inference does not use dropout | |
| L18 | Compare four LR schedules on the same training run, select the best schedule, and justify the choice from the loss curve shape | |
| L19 | Train the same network with SGD, Adam, AdamW, RMSProp, and Lion, rank them by convergence speed and final accuracy, and explain each difference | |
| L20 | Implement focal loss from scratch in PyTorch autograd, apply it to an imbalanced dataset, and measure the accuracy improvement over CE | |
| L21 | Train an autoencoder on CIFAR, measure compression ratio and SSIM, add noise to inputs, and confirm the denoising autoencoder outperforms it | |
| L22 | Train a VAE on MNIST, navigate the 2D latent space with sliders, interpolate between two digits, and explain the KL divergence role geometrically | |
| L23 | Train a GAN to convergence, identify mode collapse from D/G loss curves, apply gradient penalty, and recover training stability | |
| L24 | Build an audio classifier, compare MFCC vs mel spectrogram features, apply SpecAugment, and evaluate on a held-out recording set | |
| L25 | Train a SimCLR encoder without labels, run a linear probe evaluation, and compare its UMAP clustering to a supervised baseline | |
| L26 | Generate SHAP waterfall charts for 3 different model predictions and use them to identify a data collection gap in the training set | |
| L27 | Define a 4-dimensional HP search space in Optuna, run 50 trials with pruning, and read the parallel coordinates plot to identify the dominant HP | |
| L28 | Train a model in FP32 and FP16 AMP, measure memory reduction and speedup, and confirm accuracy parity to within 0.5 percentage points | |
| L29 | Profile a model using PyTorch Profiler, identify the top 3 most expensive operators, and apply operator fusion to reduce latency by at least 10% | |
| L30 | Export a trained PyTorch model to ONNX, serve it from a FastAPI endpoint, benchmark latency vs PyTorch inference, and visualise predictions in Streamlit | |
Section Deep Dives
The engineering insights inside each section that you will not find in a standard textbook.
| Section 01: PyTorch Fundamentals (Lessons 1โ5) |
|---|
Most people treat the PyTorch training loop as a ritual โ zero gradients, forward, loss, backward, step. This section treats it as an engineering system with failure modes, performance characteristics, and tuning levers.
| The computation graph is rebuilt every forward pass: Lesson 1 makes this visceral: you watch the graph change shape as you change the input. This is why PyTorch is called 'define-by-run' and why it is fundamentally different from TensorFlow 1.x's static graph. It also explains why you cannot call .backward() twice without retain_graph=True. | |
|---|
| DataLoader throughput is often the training bottleneck: Lesson 4 benchmarks three DataLoader configs. On most laptops, num_workers=0 halves training speed. Pinning memory speeds up GPU transfer but costs RAM. These tradeoffs are invisible unless you measure them โ this lesson makes you measure them first. | |
|---|
| Gradient clipping is not optional in production: Lesson 5 shows what happens when you train without clipping on a deep network: gradients explode, loss goes NaN, and training silently fails. The clipgradnorm_ line in every serious training loop is not cargo-culting โ it is a safety valve you will understand after this lesson. | |
|---|
| Section 02: Computer Vision (Lessons 6โ10) |
|---|
The dangerous myth about transfer learning is that you only need 50 images and a pretrained model to get production accuracy. This section shows exactly where that breaks down โ and what to do about it.
| Conv filters are not learned in isolation: Lesson 7 lets you apply hand-crafted kernels and watch feature maps emerge. The insight: each layer in a trained CNN learns filters that are optimal for the spatial frequencies present in its input. That is why you cannot just swap layers between architectures without retraining. | |
|---|
| Catastrophic forgetting is a real production risk: Lesson 8 makes catastrophic forgetting visible: fine-tune with a high learning rate and watch original class accuracy collapse. The fix โ gradual unfreezing and discriminative learning rates โ is a technique used by ULMFiT and every serious fine-tuning practitioner. | |
|---|
| NMS is where object detectors actually fail in production: Lesson 9 shows that confident detections are easy. The hard problem is NMS: how do you suppress duplicate boxes without suppressing valid overlapping objects? The IoU threshold slider in the app shows exactly how this decision trades precision for recall at the box level. | |
|---|
| Section 03: Sequence Models & NLP (Lessons 11โ15) |
|---|
The widespread move from RNNs to Transformers in industry obscures an important truth: sequence modeling intuition built on RNNs transfers directly to understanding attention mechanisms in Course 3. This section builds that intuition deliberately.
| Gradient vanishing in RNNs is not a bug โ it is a geometry problem: Lesson 11 shows gradient norm history across timesteps. At long sequences, gradients shrink exponentially because the same weight matrix is multiplied hundreds of times. LSTMs solve this via the cell state highway โ not by eliminating recurrence but by gating it. | |
|---|
| Word2Vec geometry is not metaphorical: Lesson 13 shows that king - man + woman = queen is a real vector arithmetic result, not a marketing claim. The 3D PCA projection in the app shows semantic clusters (countries, capital cities, verb tenses) that emerge purely from co-occurrence statistics. | |
|---|
| Attention is alignment, not magic: Lesson 15's attention heatmap shows which source tokens the decoder attends to when producing each output token. In a translation task, the attention aligns with the correct source words โ a direct visual proof that the model is learning syntax, not memorizing sequences. | |
|---|
| Section 04: Training Science (Lessons 16โ20) |
|---|
Training science is the highest-leverage investment a deep learning engineer can make. A model with the right architecture but wrong training decisions will consistently underperform a simpler model with excellent training discipline.
| BatchNorm's train vs eval difference causes the most mysterious production bugs: Lesson 16 shows the bug: a model that achieves 92% accuracy in training crashes to 60% in production because running statistics were computed on the wrong batch distribution. The fix requires understanding what BatchNorm is actually tracking โ this lesson makes that explicit. | |
|---|
| OneCycleLR consistently outperforms manual LR tuning: Lesson 18 compares OneCycleLR against StepLR, CosineAnnealing, and constant LR on the same model and dataset. OneCycleLR โ which ramps up and then anneals in one cycle โ reaches the same accuracy 40% faster in most experiments. The explanation involves the loss landscape geometry. | |
|---|
| Focal loss is the reason modern object detectors work: Lesson 20 trains an imbalanced classifier with standard cross-entropy and focal loss side by side. Cross-entropy is dominated by easy negatives โ 99% of anchors in an object detector are background. Focal loss down-weights those easy examples automatically, recovering 8-12% mAP on standard benchmarks. | |
|---|
| Section 05: Generative Models (Lessons 21โ25) |
|---|
Generative models are not just for generating images. They are tools for representation learning, anomaly detection, data augmentation, and compression. This section builds that broader perspective through five different generative paradigms.
| VAE latent space is smooth by design, GAN latent space is not: Lesson 22 shows that VAE interpolation between two digits produces coherent intermediate digits โ because the KL divergence term forces the latent space to be dense and continuous. GAN latent space has no such constraint, which is why GAN interpolation often passes through unrecognizable regions. | |
|---|
| Mode collapse in GANs is detectable from D/G loss curves before it looks bad: Lesson 23 shows the signature of mode collapse: discriminator loss drops to near-zero (it has an easy job because the generator only produces one mode) while generator loss spikes. The app lets you inject gradient penalty and watch training stabilize โ a technique from WGAN-GP. | |
|---|
| SimCLR's linear probe accuracy is the honest measure of representation quality: Lesson 25 freezes the SimCLR encoder and trains only a linear classifier on top. The resulting accuracy โ with no labels used during encoder training โ tells you how semantically rich the learned representations are. This is the evaluation protocol used in every self-supervised learning paper. | |
|---|
| Section 06: Evaluation, Explainability & Shipping (Lessons 26โ30) |
|---|
The last 15% of the ML lifecycle โ explaining, tuning, optimizing, and serving โ accounts for 60% of the time spent by production ML teams. This section makes you productive in that phase.
| SHAP values reveal the training set's hidden biases: Lesson 26 shows a model that achieves high accuracy by learning spurious correlations โ identified immediately from SHAP waterfall charts. The feature contributions expose what the model learned, not just whether it was right. That distinction is what separates a model that generalises from one that memorises. | |
|---|
| Optuna's pruning changes what search is possible: Lesson 27 shows that without pruning, 50 HP trials take too long to be useful on a laptop. With Hyperband pruning, Optuna stops unpromising trials early and reallocates compute to promising ones โ achieving the same search quality in 30% of the wall-clock time. | |
|---|
| ONNX export is not the finish line โ inference validation is: Lesson 30 exports a model to ONNX and then validates that its outputs match the PyTorch outputs to within 1e-5 for every test input. That validation step is non-negotiable in production โ ONNX operator coverage varies by model, and silent accuracy degradation after export is a real failure mode. | |
|---|
Appendix: Complete Tool Stack & Setup
Every tool listed below is open-source and runs without a cloud account or GPU. All are pip-installable. GPU availability accelerates training but is never required.
| Library | Role | Install | Used In | |
|---|
| PyTorch (CPU build) | Core framework | pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu | All 30 lessons | |
| Streamlit | App framework | pip install streamlit | All 30 lessons | |
| torchinfo | Model summary | pip install torchinfo | Lessons 3, 29 | |
| Torchviz | Compute graph | pip install torchviz | Lesson 1 | |
| HuggingFace Transformers | NLP models | pip install transformers | Lessons 14, 28 | |
| YOLOv5 | Object detection | git clone yolov5, pip install | Lesson 9 | |
| OpenCV | Image handling | pip install opencv-python | Lessons 9, 10 | |
| Gensim | Word embeddings | pip install gensim | Lesson 13 | |
| NLTK | Tokenization | pip install nltk | Lesson 12 | |
| Librosa | Audio features | pip install librosa | Lesson 24 | |
| SHAP | Explainability | pip install shap | Lesson 26 | |
| LIME | Local XAI | pip install lime | Lesson 26 | |
| Optuna | HP optimization | pip install optuna | Lesson 27 | |
| ONNX + onnxruntime | Model export | pip install onnx onnxruntime | Lesson 30 | |
| FastAPI + Uvicorn | Model serving | pip install fastapi uvicorn | Lesson 30 | |
| UMAP-learn | Embedding viz | pip install umap-learn | Lesson 25 | |
| Plotly | Interactive viz | pip install plotly | Most lessons | |
Quick Start
| # Install PyTorch (CPU) โ works on any OS pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu # Navigate to Lesson 1 and install its dependencies cd lesson_01 && pip install -r requirements.txt # Launch the app โ Tensor Ops Lab opens in your browser streamlit run app.py |
|---|
| By Lesson 30 you will have built, trained, explained, tuned, compressed, and served a deep learning system โ end to end. That is the complete arc of what an AI engineer does on any given sprint, compressed into 30 lessons that each take an hour and each ship something real. โ What this course delivers | |
|---|