Day 3 : K3d Minimalist Boot: Surgical Component Removal

Lesson 3 15 min

Architecture View

8GB Laptop Operating System ~1.5GB Docker Desktop ~800MB K3d Cluster: nano-substrate K3s Server ~280MB ✓ API Server ✓ Controller Manager ✓ Scheduler ✓ etcd ✗ Traefik DISABLED ✗ Metrics Server DISABLED ✗ Cloud Controller DISABLED System Pods CoreDNS ~25MB Cilium Agent DaemonSet ~100MB Cilium Operator Deployment ~80MB Linux Kernel kube-proxy replacement L7 routing via eBPF DNS Resolution CNI Plugin Managed by eBPF Programs

The Bloat Trap: "Batteries Included" Kills Performance

Every K3d tutorial begins the same way:

bash
k3d cluster create my-cluster

This single command installs:

  • Traefik: A full-featured ingress controller with metrics, dashboard, and Let's Encrypt integration

  • Metrics Server: Kubernetes cluster-wide resource usage aggregator

  • Cloud Controller Manager: AWS/GCP/Azure integration placeholder

  • Flannel: CNI networking with VXLAN overhead

On a production cloud cluster with 64GB nodes? Fine. On your 8GB laptop? You just consumed 565MB before deploying a single application.

The "batteries included" philosophy assumes infinite resources. Platform engineers building on constrained hardware must adopt surgical precision: install only what you need, disable everything else, and replace bloated components with efficient alternatives.

The Hidden Tax of Default Components

Traefik: The 180MB Ingress Nobody Asked For

Traefik ships with K3s as the default ingress controller. It's powerful - supporting HTTP/3, dynamic config reloading, middleware chains, and a web dashboard. But consider what you're paying:

Memory Breakdown:

  • Base Process: ~60MB

  • Prometheus Metrics Endpoint: ~40MB (buffering time-series data)

  • Dashboard UI: ~20MB (static assets + Go templates)

  • Dynamic Config Watcher: ~40MB (watches Ingress resources, recompiles routing table)

  • Connection Buffers: ~20MB (default 8192 connections × 2KB buffers)

Most Nano-IDP use cases need exactly one ingress: routing to your custom portal frontend. You don't need:

  • Let's Encrypt automation (you're on localhost)

  • Metrics scraping (you'll use /proc directly)

  • Dynamic middleware (your routes are static)

The Nano Alternative: We'll install Cilium in kube-proxy replacement mode, which provides L7 HTTP routing via eBPF without a separate ingress controller process. Memory cost: ~0MB (routing happens in kernel space).

Metrics Server: The Performance Paradox

Metrics Server scrapes kubelet /stats/summary endpoints every 60 seconds to populate kubectl top commands. Sounds harmless. But:

The Scraping Cost:

  • Base Process: ~20MB

  • gRPC Client Pool: ~10MB (one connection per node)

  • Time-Series Buffer: ~10MB (stores 1-minute window of metrics)

The Real Problem: Every 60 seconds, Metrics Server hits kubelet endpoints, which:

  1. Reads cgroup stats from /sys/fs/cgroup/memory/kubepods/...

  2. Aggregates per-container metrics

  3. Serializes to JSON

  4. Sends over gRPC

On a node running 20 pods, this creates CPU spikes every minute. On an 8GB laptop already running Docker Desktop + VSCode + Chrome, these spikes trigger swap thrashing.

The Nano Alternative: You don't need cluster-wide metrics. You need per-pod observability. We'll create a 15-line bash script that reads /proc/meminfo and docker stats directly. Cost: 0MB runtime, 5-second poll interval configurable.

Cloud Controller Manager: Ghost in the Machine

K3s includes a placeholder Cloud Controller Manager that polls for cloud provider APIs. On local K3d, it does nothing except:

  • Wake up every 30 seconds

  • Attempt to contact AWS/GCP metadata endpoints

  • Log errors to stderr

  • Consume 30MB

This is pure waste. On Nano-IDP, we use Crossplane for cloud integration, which runs on-demand only when provisioning infrastructure.

The Nano Architecture: --k3s-arg Surgical Strikes

K3d accepts --k3s-arg flags to pass arguments directly to the K3s server process. This is our scalpel.

Disable Syntax

Each component requires specific disable syntax:

bash
k3d cluster create nano-substrate
--k3s-arg "--disable=traefik@server:0"
--k3s-arg "--disable=metrics-server@server:0"
--k3s-arg "--disable-cloud-controller@server:0"
--k3s-arg "--flannel-backend=none@server:0"
--k3s-arg "--disable-network-policy@server:0"

Why @server:0?
K3d cluster notation uses @server:X to target specific nodes. :0 means "the first (and only) server node."

Why disable Flannel?
Flannel (K3s default CNI) uses VXLAN encapsulation, adding ~80MB memory + 10% network overhead. Cilium uses eBPF, running in kernel space with near-zero overhead.

The Cilium Replacement

Cilium provides:

  • CNI (Container Network Interface): Pod-to-pod routing

  • Kube-proxy replacement: Service load balancing via eBPF

  • L7 HTTP routing: Ingress without an ingress controller

Installation:

bash
# Cilium CLI - 10MB binary, no daemon
curl -L https://github.com/cilium/cilium-cli/releases/download/v0.15.0/cilium-linux-amd64.tar.gz | tar xz
sudo mv cilium /usr/local/bin/

# Install Cilium in kube-proxy replacement mode
cilium install
--set kubeProxyReplacement=strict
--set operator.replicas=1
--set hubble.enabled=false
--set prometheus.enabled=false

Memory Cost:

  • Cilium Agent (DaemonSet): ~100MB per node

  • Cilium Operator (Deployment): ~80MB

  • Total: ~180MB (vs 260MB for Traefik + Flannel)

Sequence/Flow - Cluster Creation & Verification

User create_cluster.sh K3d Docker K3s Server Cilium verify.sh ./create_cluster.sh k3d cluster delete Remove old containers k3d cluster create
--disable traefik
--disable metrics-server
--disable cloud-controller Create server container Start K3s with args Skip Traefik Skip Metrics Server Deploy CoreDNS Ready (no CNI) Cluster created ✓ Cluster ready ./install_cilium.sh Deploy Cilium DaemonSet Deploy Operator Install eBPF ✓ CNI ready ./verify.sh kubectl get pods No Traefik ✓ Run nginx test pod Assign IP (eBPF) curl nginx IP HTTP 200 OK ✓ All checks passed

Memory Map - Before vs After

BEFORE: Standard K3d (~565MB) K3s Server — 350MB Traefik — 180MB Metrics Server — 40MB Cloud Controller — 30MB Flannel — 80MB CoreDNS — 25MB AFTER: Minimalist K3d (~305MB + Cilium) K3s Server (Stripped) — 280MB Traefik — DISABLED ✗ Metrics Server — DISABLED ✗ Cloud Controller — DISABLED ✗ Flannel — DISABLED ✗ CoreDNS — 25MB Cilium Agent — 100MB Cilium Operator — 80MB Memory Savings Breakdown • Disabled Traefik: −180MB • Disabled Metrics Server: −40MB • Disabled Cloud Controller: −30MB • Flannel → Cilium: +20MB (faster) • K3s Optimized: −70MB Total Savings: −260MB
Need help?