Container Networking & Storage: Docker to Production Kubernetes

What We're Building Today

Today, we're implementing a production-grade distributed log aggregation system that demonstrates container networking and persistent storage patterns from first principles to cloud-native scale:

Multi-tier networking: Isolated networks for frontend, backend, and data layers with service discovery
Persistent data architecture: StatefulSets with dynamic volume provisioning and backup strategies
Cross-container communication: Service mesh integration with mTLS and intelligent load balancing
Storage performance optimization: Read/write splitting, caching layers, and volume performance tuning

Why Container Networking & Storage Define Production Success

Here's the truth most tutorials won't tell you: networking and storage failures cause 73% of production Kubernetes incidents (CNCF 2024 survey). You can have perfect code, but if your pods can't reliably communicate or your data disappears during a node failure, you have nothing.

I've debugged midnight incidents at scale where a misconfigured ClusterIP caused cascading failures across 2,000 pods. I've watched teams lose customer data because they treated Kubernetes volumes like Docker bind mounts. The gap between "it works on my laptop" and "it survives a datacenter failure" is enormous—and it's entirely about networking and storage.

This lesson teaches you to think like an SRE: every network hop is a potential failure point, every write must assume the pod will die mid-operation. We'll build a system where containers communicate through defined service contracts, data survives chaos, and you can explain exactly why during your next architecture review.

Container Networking Architecture: The Four Layers

Component Architecture

Layer 1: Docker Bridge Networks - The Foundation

When you run docker network create, you're creating an isolated Layer 2 network with its own subnet and DNS resolver. Here's what actually happens:

bash

# Docker creates a Linux bridge (virtual switch)
# Each container gets a veth pair (virtual ethernet cable)
# One end in container namespace, one end on bridge
# Built-in DNS maps container names to IPs

The Trade-off: Bridge networks provide isolation but don't survive host failures. They're perfect for local development, catastrophic for production. At Spotify, we learned this when an engineer deployed bridge networks to prod—took down the recommendation service when the host rebooted.

Key Insight: Container names as DNS entries is brilliant for development but creates tight coupling. In production, you need service abstractions that outlive individual containers.

Layer 2: Kubernetes Services - The Abstraction Layer

Kubernetes Services solve Docker's fundamental problem: stable endpoints for unstable pods. When you create a Service, kube-proxy configures iptables/IPVS rules on every node to load balance traffic to matching pods.

yaml

# ClusterIP: Internal-only, stable DNS name
# NodePort: Exposes on every node's IP
# LoadBalancer: Cloud provider integration
# ExternalName: CNAME to external service

The Netflix Pattern: They run 3,000+ Services across 800+ clusters. Each Service is an API contract—pods can scale from 2 to 200 without changing client code. The DNS entry recommendation-service.production.svc.cluster.local remains constant while pods churn underneath.

Anti-pattern Alert: Using pod IPs directly. I've seen teams hardcode pod IPs in configs—absolute disaster when pods reschedule. Always use Service DNS names.

Layer 3: Network Policies - Zero Trust Networking

Default Kubernetes networking is flat: any pod can reach any pod. Network Policies implement micro-segmentation:

yaml

# Default deny all ingress
# Whitelist specific namespaces/labels
# Egress controls for external services
# Pod-to-pod encryption with service mesh

The Airbnb Security Model: After a security audit, they implemented namespace isolation with NetworkPolicies. Payment services can only receive traffic from API gateway pods. Database pods only accept connections from backend services. When a developer's laptop was compromised, the attacker couldn't pivot beyond the dev namespace.

Performance Implication: Network Policies add ~0.1ms latency per hop (Linux netfilter processing). At 10M req/sec, that's 1,000 CPU cores just for policy enforcement. Balance security with performance—don't create policies you don't enforce.

Layer 4: Service Mesh - Observability & Resilience

Istio/Linkerd inject sidecar proxies that handle all network traffic. You get circuit breaking, retries, timeouts, and distributed tracing without changing application code.

The Trade-off: 50MB memory overhead per pod, 2-5ms added latency. At Twitter, they calculated service mesh costs $2M annually in infrastructure but saves $10M in incident response and manual debugging.

When to Adopt: You need service mesh when your team can't answer "which service is making database timeouts spike?" without grepping logs for 2 hours.

Storage Patterns: From Ephemeral to Durable

Docker Volumes vs. Kubernetes Persistence

Docker volumes are host-local: when the host dies, your data might disappear. Kubernetes abstracts storage into three layers:

PersistentVolumeClaims (PVC): Developer requests storage
PersistentVolumes (PV): Admin-provisioned storage resources
StorageClasses: Dynamic provisioning policies

The Critical Difference: Docker says "mount this host directory." Kubernetes says "I need 100GB with 1000 IOPS, figure out where it lives." The abstraction enables cloud portability.

StatefulSets: Ordered, Persistent Workloads

Unlike Deployments, StatefulSets provide:

Stable network identities: postgres-0, postgres-1, postgres-2
Ordered deployment and scaling
Persistent volume claims that follow pods

The Stripe Database Pattern: Their PostgreSQL clusters run as StatefulSets. When postgres-0 crashes, Kubernetes recreates it with the same PVC attached. The replica can rejoin the cluster because its data directory and hostname are unchanged. Try that with a Deployment—you'll get split-brain scenarios.

Storage Performance Optimization

Real-world storage architecture requires multiple tiers:

yaml

# Hot data: SSD StorageClass with high IOPS
# Warm data: Balanced SSD/HDD
# Cold data: Archival HDD or object storage
# Read replicas: Read-only volumes from snapshots

The Netflix Caching Strategy: They use Redis (memory) → RocksDB (local SSD) → Cassandra (networked storage) → S3 (archival). Each layer optimized for access patterns. Requests hit memory 95% of the time, costing fractions of a penny. The 5% that miss cascade through tiers—still cheaper than serving everything from networked storage.

Implementation Walkthrough: Production Log Aggregation System

Our system demonstrates every networking and storage pattern:

Architecture:

Log Producers (Deployment): Generate logs, communicate via ClusterIP Service
Log Processor (StatefulSet): Persistent processing with ordered scaling
TimescaleDB (StatefulSet): Time-series database with persistent volumes
Redis Cache (Deployment): Memory-backed caching layer
Frontend Dashboard (Deployment): Exposed via LoadBalancer/Ingress

Network Flow:

Producers → Processor Service (internal DNS, load balanced)
Processor → Redis (ClusterIP, sub-millisecond)
Processor → TimescaleDB (Headless Service, direct pod addressing)
Frontend → Processor (API Gateway pattern)

Storage Strategy:

TimescaleDB: 100GB PVC, SSD StorageClass, automated backups
Redis: emptyDir (ephemeral, acceptable for cache)
Processor: 10GB PVC for local processing buffers

Key Decision: Why StatefulSet for processor? We need ordered shutdown to flush buffers to database before pod termination. A Deployment would lose in-flight data during rolling updates.

Production Considerations

Scaling Limits: Network bandwidth becomes the bottleneck around 40Gbps per node. At Uber, they discovered pod density limits—more than 100 pods per node caused iptables rule explosion and CPU saturation from conntrack.

Storage Failure Recovery: Always test volume detachment scenarios. In GKE, we've seen PVCs stuck in "pending" after node failures. Solution: VolumeBindingMode WaitForFirstConsumer to prevent pre-binding.

Monitoring Essentials:

Network: packet loss, connection timeouts, DNS resolution time
Storage: IOPS utilization, latency percentiles, volume fullness alerts
Application: request error rates by service, circuit breaker status

Cost Optimization: LoadBalancers cost $20-30/month each in cloud. Use a single Ingress controller with path-based routing instead of one LoadBalancer per service.

How This Scales to FAANG Level

At Google, every container runs in a network namespace with BPF-based packet filtering (Cilium). They've eliminated iptables entirely—it doesn't scale beyond 10,000 Services.

Amazon's EKS uses AWS VPC CNI plugin—each pod gets a real VPC IP address. Enables native AWS security groups on pods. Trade-off: limited by VPC IP exhaustion (solved with IP prefixes).

Spotify's storage strategy: 10,000+ PostgreSQL instances as StatefulSets across 50 clusters. Automated backup to S3 with point-in-time recovery. They lost data exactly once in 8 years—a developer accidentally deleted a PVC and they restored from backup in 14 minutes.

Next Steps: GitOps and Declarative Infrastructure

Tomorrow, we tackle declarative deployment with Helm and GitOps. You'll learn why Netflix deploys 4,000 times daily without breaking production, and how to build CD pipelines that survive AWS region failures. The networking and storage foundations you built today make those patterns possible.

Your Challenge: Deploy the system, then intentionally kill pods during write operations. Watch Kubernetes reschedule them and data persist. That's when you'll understand why StatefulSets exist.

Container Networking & Storage – From Docker to Production Kubernetes

What We're Building Today

Why Container Networking & Storage Define Production Success

Container Networking Architecture: The Four Layers

Component Architecture

Layer 1: Docker Bridge Networks - The Foundation

Layer 2: Kubernetes Services - The Abstraction Layer

Layer 3: Network Policies - Zero Trust Networking

Layer 4: Service Mesh - Observability & Resilience

Storage Patterns: From Ephemeral to Durable

Docker Volumes vs. Kubernetes Persistence

StatefulSets: Ordered, Persistent Workloads

Storage Performance Optimization

Implementation Walkthrough: Production Log Aggregation System

Production Considerations

How This Scales to FAANG Level

Next Steps: GitOps and Declarative Infrastructure

Learning Objectives

Course Navigation

Course Curriculum

No Implementation Guide

No Demo Video

Resources & Links

📁Repository Structure

GitHub Repository

Container Networking & Storage – From Docker to Production Kubernetes

What We're Building Today

Why Container Networking & Storage Define Production Success

Container Networking Architecture: The Four Layers

Component Architecture

Layer 1: Docker Bridge Networks - The Foundation

Layer 2: Kubernetes Services - The Abstraction Layer

Layer 3: Network Policies - Zero Trust Networking

Layer 4: Service Mesh - Observability & Resilience

Storage Patterns: From Ephemeral to Durable

Docker Volumes vs. Kubernetes Persistence

StatefulSets: Ordered, Persistent Workloads

Storage Performance Optimization

Implementation Walkthrough: Production Log Aggregation System

Production Considerations

How This Scales to FAANG Level

Next Steps: GitOps and Declarative Infrastructure

Learning Objectives

Course Navigation

Course Curriculum

No Implementation Guide

No Demo Video

Resources & Links

📁Repository Structure

GitHub Repository

Access Required