Building Discord: From Socket to Scale
Day 3: The Event Loop - Understanding How Servers Handle Thousands of Connections
What You'll Learn Today
By the end of this lesson, you'll understand how Discord, WhatsApp, and other real-time platforms handle millions of simultaneous connections without your computer exploding. You'll build a working server from scratch that can handle 10,000 concurrent connections using just a single thread.
Core Concepts:
Why traditional thread-per-connection servers fail at scale
How the Reactor pattern multiplexes I/O operations
Managing ByteBuffers and memory without triggering garbage collection
Building a complete event loop in Java 21
Part 1: Understanding The Problem
The Spring Boot Trap
Most beginners building a chat server will reach for something like Spring's @EnableWebSocket annotation:
This works great for 100 users. At 10,000 connections, you'll see:
Your server's memory usage explodes
Garbage collection pauses exceeding 500 milliseconds
Thread pools that can't keep up with incoming requests
The framework hides what actually happens when a client sends data. When your production server crashes at 3 AM during a livestream event, you can't fix what you don't understand.
The Real Problem: Thread-Per-Connection Death
Here's the fundamental rule: You cannot create an operating system thread for every client connection at scale.
Why not?
Memory Overhead: Each Java thread has a default stack size of 1MB. If you have 50,000 connections, that's 50GB of RAM before you've handled a single message.
Context Switching: Your computer has maybe 8 CPU cores. When 50,000 threads compete for those cores, the operating system spends all its time switching between threads instead of actually processing data.
Garbage Collection Pressure: Thread objects themselves live in memory. Creating and destroying threads during connection spikes triggers full garbage collection cycles, freezing your entire application.
Even Java's new Virtual Threads don't solve this if you use them carelessly. You still need discipline around buffer management and back-pressure.
Part 2: The Solution - The Reactor Pattern
How Discord Actually Works
Discord doesn't create a thread per user. Neither does WhatsApp or LinkedIn Realtime. They use non-blocking I/O multiplexing: one thread monitoring thousands of sockets, only doing work when data actually arrives.
The Four Core Components
1. The Selector (The Multiplexer)
Java's Selector is a wrapper around operating system primitives (epoll on Linux, kqueue on macOS, IOCP on Windows). You register many socket connections with a single Selector, then call select() which blocks until at least one socket has data ready.
2. Non-Blocking Channels
ServerSocketChannel and SocketChannel in non-blocking mode return immediately from I/O operations. If no data is available, read() returns 0 instead of parking the thread.
3. Direct ByteBuffers
Allocating ByteBuffer.allocateDirect() creates native memory outside the Java heap. This avoids:
Garbage collection scanning these buffers
Copying data from native socket buffers to JVM heap and back during writes
Trade-off: Direct buffers are slower to allocate. We solve this by pre-allocating a pool.
4. State Machine Per Connection
Each connection tracks:
Current state:
HANDSHAKE,READY,CLOSINGRead buffer position (for parsing partial frames)
Write queue (messages waiting to be sent)
The Event Loop Lifecycle
Part 3: The Hard Parts - ByteBuffer Management
Understanding ByteBuffer Position and Limit
This is where most implementations break. ByteBuffer has position and limit pointers that you must manually manage:
Common Bugs:
Forgetting
compact()causes old data to be re-processedForgetting
flip()means parsing reads garbage beyond the limit
Handling Partial Frames
TCP is a byte stream, not a message stream. A single read() might return:
Half a message
Three and a half messages
Zero bytes (socket has no data yet)
Our protocol uses: [4-byte length][payload bytes]
Zero-Copy Writes with Scatter/Gather
Instead of copying message data into a single ByteBuffer, use SocketChannel.write(ByteBuffer[]) to write multiple buffers in one system call:
This avoids intermediate allocations and uses operating system vectored I/O.
Part 4: Connection State Machine
Every connection moves through these states:
CONNECTING - TCP socket accepted, buffers allocated
HANDSHAKE - Waiting for client to send "FLUX_HELLO"
READY - Active messaging, echoing data back to client
CLOSING - Flushing write queue before closing socket
CLOSED - Connection terminated, resources freed
The critical path is HANDSHAKE to READY. All other paths lead to CLOSING (failure or timeout).
Part 5: Production Readiness
Metrics That Matter
When running in production, monitor these:
Selector Latency: Time spent in select() call
Normal: Less than 1ms
Problem: More than 10ms (too many channels)
Read Buffer Allocation Rate: Should be approximately 0 after warmup (reusing pool)
High rate means garbage collection thrashing
Connection State Distribution: How many in HANDSHAKE vs READY?
Spike in CLOSING indicates attack or bug
Thread Context Switches: Should be single-digit per second for event loop thread
Visual Verification with VisualVM
Open VisualVM and attach to the Gateway process to watch:
Heap Usage: Should be flat after initial allocation. Sawtooth pattern indicates memory leak.
Non-Heap (Direct Memory): Grows proportionally to connection count, then plateaus.
Thread Count: Must remain constant (1 event loop + 1 dashboard server). If growing, you're leaking threads.
Expected Performance
A properly implemented event loop should achieve:
10,000 concurrent connections on single thread
Message latency p99 less than 5ms
CPU usage under 40% on one core
Heap usage stable after warmup
Zero OutOfMemoryError under normal load
Expected Failure Mode: At 10,000+ connections on a single event loop, select() latency degrades (O(N) scan of ready keys). The next lesson covers the Multi-Reactor pattern (one Selector per CPU core).