Day 3 : K3d Minimalist Boot: Surgical Component Removal

Lesson 3 15 min

Architecture View

The Bloat Trap: "Batteries Included" Kills Performance

Every K3d tutorial begins the same way:

bash

k3d cluster create my-cluster

This single command installs:

Traefik: A full-featured ingress controller with metrics, dashboard, and Let's Encrypt integration
Metrics Server: Kubernetes cluster-wide resource usage aggregator
Cloud Controller Manager: AWS/GCP/Azure integration placeholder
Flannel: CNI networking with VXLAN overhead

On a production cloud cluster with 64GB nodes? Fine. On your 8GB laptop? You just consumed 565MB before deploying a single application.

The "batteries included" philosophy assumes infinite resources. Platform engineers building on constrained hardware must adopt surgical precision: install only what you need, disable everything else, and replace bloated components with efficient alternatives.

The Hidden Tax of Default Components

Traefik: The 180MB Ingress Nobody Asked For

Traefik ships with K3s as the default ingress controller. It's powerful - supporting HTTP/3, dynamic config reloading, middleware chains, and a web dashboard. But consider what you're paying:

Memory Breakdown:

Base Process: ~60MB
Prometheus Metrics Endpoint: ~40MB (buffering time-series data)
Dashboard UI: ~20MB (static assets + Go templates)
Dynamic Config Watcher: ~40MB (watches Ingress resources, recompiles routing table)
Connection Buffers: ~20MB (default 8192 connections × 2KB buffers)

Most Nano-IDP use cases need exactly one ingress: routing to your custom portal frontend. You don't need:

Let's Encrypt automation (you're on localhost)
Metrics scraping (you'll use /proc directly)
Dynamic middleware (your routes are static)

The Nano Alternative: We'll install Cilium in kube-proxy replacement mode, which provides L7 HTTP routing via eBPF without a separate ingress controller process. Memory cost: ~0MB (routing happens in kernel space).

Metrics Server: The Performance Paradox

Metrics Server scrapes kubelet /stats/summary endpoints every 60 seconds to populate kubectl top commands. Sounds harmless. But:

The Scraping Cost:

Base Process: ~20MB
gRPC Client Pool: ~10MB (one connection per node)
Time-Series Buffer: ~10MB (stores 1-minute window of metrics)

The Real Problem: Every 60 seconds, Metrics Server hits kubelet endpoints, which:

Reads cgroup stats from /sys/fs/cgroup/memory/kubepods/...
Aggregates per-container metrics
Serializes to JSON
Sends over gRPC

On a node running 20 pods, this creates CPU spikes every minute. On an 8GB laptop already running Docker Desktop + VSCode + Chrome, these spikes trigger swap thrashing.

The Nano Alternative: You don't need cluster-wide metrics. You need per-pod observability. We'll create a 15-line bash script that reads /proc/meminfo and docker stats directly. Cost: 0MB runtime, 5-second poll interval configurable.

Cloud Controller Manager: Ghost in the Machine

K3s includes a placeholder Cloud Controller Manager that polls for cloud provider APIs. On local K3d, it does nothing except:

Wake up every 30 seconds
Attempt to contact AWS/GCP metadata endpoints
Log errors to stderr
Consume 30MB

This is pure waste. On Nano-IDP, we use Crossplane for cloud integration, which runs on-demand only when provisioning infrastructure.

The Nano Architecture: --k3s-arg Surgical Strikes

K3d accepts --k3s-arg flags to pass arguments directly to the K3s server process. This is our scalpel.

Disable Syntax

Each component requires specific disable syntax:

bash

k3d cluster create nano-substrate
--k3s-arg "--disable=traefik@server:0"
--k3s-arg "--disable=metrics-server@server:0"
--k3s-arg "--disable-cloud-controller@server:0"
--k3s-arg "--flannel-backend=none@server:0"
--k3s-arg "--disable-network-policy@server:0"

Why @server:0?
K3d cluster notation uses @server:X to target specific nodes. :0 means "the first (and only) server node."

Why disable Flannel?
Flannel (K3s default CNI) uses VXLAN encapsulation, adding ~80MB memory + 10% network overhead. Cilium uses eBPF, running in kernel space with near-zero overhead.

The Cilium Replacement

Cilium provides:

CNI (Container Network Interface): Pod-to-pod routing
Kube-proxy replacement: Service load balancing via eBPF
L7 HTTP routing: Ingress without an ingress controller

Installation:

bash

# Cilium CLI - 10MB binary, no daemon
curl -L https://github.com/cilium/cilium-cli/releases/download/v0.15.0/cilium-linux-amd64.tar.gz | tar xz
sudo mv cilium /usr/local/bin/

# Install Cilium in kube-proxy replacement mode
cilium install
--set kubeProxyReplacement=strict
--set operator.replicas=1
--set hubble.enabled=false
--set prometheus.enabled=false

Memory Cost:

Cilium Agent (DaemonSet): ~100MB per node
Cilium Operator (Deployment): ~80MB
Total: ~180MB (vs 260MB for Traefik + Flannel)

Sequence/Flow - Cluster Creation & Verification

--disable traefik
--disable metrics-server
--disable cloud-controller Create server container Start K3s with args Skip Traefik Skip Metrics Server Deploy CoreDNS Ready (no CNI) Cluster created ✓ Cluster ready ./install_cilium.sh Deploy Cilium DaemonSet Deploy Operator Install eBPF ✓ CNI ready ./verify.sh kubectl get pods No Traefik ✓ Run nginx test pod Assign IP (eBPF) curl nginx IP HTTP 200 OK ✓ All checks passed

Memory Map - Before vs After

Learning Objectives

✓ Build a minimal, high-performance K3d Kubernetes substrate by surgically disabling default K3s components and replacing them with an eBPF-based Cilium stack to operate efficiently within an 8GB hardware constraint.

💬 Discuss this topic