Day 3: The Loop: Building the main event loop for a single client connection.

Lesson 3 60 min

Building Discord: From Socket to Scale

Day 3: The Event Loop - Understanding How Servers Handle Thousands of Connections

What You'll Learn Today

By the end of this lesson, you'll understand how Discord, WhatsApp, and other real-time platforms handle millions of simultaneous connections without your computer exploding. You'll build a working server from scratch that can handle 10,000 concurrent connections using just a single thread.

Core Concepts:

Why traditional thread-per-connection servers fail at scale
How the Reactor pattern multiplexes I/O operations
Managing ByteBuffers and memory without triggering garbage collection
Building a complete event loop in Java 21

Part 1: Understanding The Problem

Component Architecture

The Spring Boot Trap

Most beginners building a chat server will reach for something like Spring's @EnableWebSocket annotation:

java

@ServerEndpoint("/chat")
public class ChatEndpoint {
    @OnMessage
    public void handleMessage(String message) {
        // Magic happens here... or does it?
    }
}

This works great for 100 users. At 10,000 connections, you'll see:

Your server's memory usage explodes
Garbage collection pauses exceeding 500 milliseconds
Thread pools that can't keep up with incoming requests

The framework hides what actually happens when a client sends data. When your production server crashes at 3 AM during a livestream event, you can't fix what you don't understand.

The Real Problem: Thread-Per-Connection Death

Here's the fundamental rule: You cannot create an operating system thread for every client connection at scale.

Why not?

Memory Overhead: Each Java thread has a default stack size of 1MB. If you have 50,000 connections, that's 50GB of RAM before you've handled a single message.

Context Switching: Your computer has maybe 8 CPU cores. When 50,000 threads compete for those cores, the operating system spends all its time switching between threads instead of actually processing data.

Garbage Collection Pressure: Thread objects themselves live in memory. Creating and destroying threads during connection spikes triggers full garbage collection cycles, freezing your entire application.

Even Java's new Virtual Threads don't solve this if you use them carelessly. You still need discipline around buffer management and back-pressure.

Part 2: The Solution - The Reactor Pattern

How Discord Actually Works

Flowchart

Discord doesn't create a thread per user. Neither does WhatsApp or LinkedIn Realtime. They use non-blocking I/O multiplexing: one thread monitoring thousands of sockets, only doing work when data actually arrives.

The Four Core Components

1. The Selector (The Multiplexer)

Java's Selector is a wrapper around operating system primitives (epoll on Linux, kqueue on macOS, IOCP on Windows). You register many socket connections with a single Selector, then call select() which blocks until at least one socket has data ready.

java

Selector selector = Selector.open();
channel.register(selector, SelectionKey.OP_READ);
while (running) {
    selector.select(); // Blocks until activity
    Set<SelectionKey> keys = selector.selectedKeys();
    // Process ready channels
}

2. Non-Blocking Channels

ServerSocketChannel and SocketChannel in non-blocking mode return immediately from I/O operations. If no data is available, read() returns 0 instead of parking the thread.

3. Direct ByteBuffers

Allocating ByteBuffer.allocateDirect() creates native memory outside the Java heap. This avoids:

Garbage collection scanning these buffers
Copying data from native socket buffers to JVM heap and back during writes

Trade-off: Direct buffers are slower to allocate. We solve this by pre-allocating a pool.

4. State Machine Per Connection

Each connection tracks:

Current state: HANDSHAKE, READY, CLOSING
Read buffer position (for parsing partial frames)
Write queue (messages waiting to be sent)

The Event Loop Lifecycle

Code

1. Server binds to port 9090
2. Register ServerSocketChannel with Selector (OP_ACCEPT)
3. Loop forever:
   a. selector.select() - blocks until events
   b. For each SelectionKey:
      - OP_ACCEPT: Create new SocketChannel, register for OP_READ
      - OP_READ: Read bytes into buffer, parse protocol frames
      - OP_WRITE: Flush pending data from write queue
   c. Check connection timeouts/heartbeats

Part 3: The Hard Parts - ByteBuffer Management

Understanding ByteBuffer Position and Limit

This is where most implementations break. ByteBuffer has position and limit pointers that you must manually manage:

java

ByteBuffer buffer = ByteBuffer.allocateDirect(8192);

// Reading from socket
int bytesRead = channel.read(buffer); // Advances position

// Parsing requires flipping
buffer.flip(); // limit = position; position = 0
while (buffer.remaining() >= 4) {
    int messageLength = buffer.getInt(); // Reads 4 bytes
    // Parse message...
}

// Compact unread bytes to start of buffer
buffer.compact(); // Copies remaining bytes to position 0

Common Bugs:

Forgetting compact() causes old data to be re-processed
Forgetting flip() means parsing reads garbage beyond the limit

Handling Partial Frames

TCP is a byte stream, not a message stream. A single read() might return:

Half a message
Three and a half messages
Zero bytes (socket has no data yet)

Our protocol uses: [4-byte length][payload bytes]

java

if (readBuffer.remaining() < 4) {
    return; // Need more data for length header
}
int length = readBuffer.getInt(readBuffer.position());
if (readBuffer.remaining() < 4 + length) {
    return; // Need more data for full payload
}
// Now we can safely parse the complete frame
readBuffer.getInt(); // Consume length header
byte[] payload = new byte[length];
readBuffer.get(payload);

Zero-Copy Writes with Scatter/Gather

Instead of copying message data into a single ByteBuffer, use SocketChannel.write(ByteBuffer[]) to write multiple buffers in one system call:

java

ByteBuffer header = ByteBuffer.allocate(4).putInt(payload.length).flip();
ByteBuffer body = ByteBuffer.wrap(payload);
channel.write(new ByteBuffer[]{header, body});

This avoids intermediate allocations and uses operating system vectored I/O.

Part 4: Connection State Machine

State Machine

Every connection moves through these states:

CONNECTING - TCP socket accepted, buffers allocated
HANDSHAKE - Waiting for client to send "FLUX_HELLO"
READY - Active messaging, echoing data back to client
CLOSING - Flushing write queue before closing socket
CLOSED - Connection terminated, resources freed

The critical path is HANDSHAKE to READY. All other paths lead to CLOSING (failure or timeout).

Part 5: Production Readiness

Metrics That Matter

When running in production, monitor these:

Selector Latency: Time spent in select() call

Normal: Less than 1ms
Problem: More than 10ms (too many channels)

Read Buffer Allocation Rate: Should be approximately 0 after warmup (reusing pool)

High rate means garbage collection thrashing

Connection State Distribution: How many in HANDSHAKE vs READY?

Spike in CLOSING indicates attack or bug

Thread Context Switches: Should be single-digit per second for event loop thread

Visual Verification with VisualVM

Open VisualVM and attach to the Gateway process to watch:

Heap Usage: Should be flat after initial allocation. Sawtooth pattern indicates memory leak.

Non-Heap (Direct Memory): Grows proportionally to connection count, then plateaus.

Thread Count: Must remain constant (1 event loop + 1 dashboard server). If growing, you're leaking threads.

Expected Performance

A properly implemented event loop should achieve:

10,000 concurrent connections on single thread
Message latency p99 less than 5ms
CPU usage under 40% on one core
Heap usage stable after warmup
Zero OutOfMemoryError under normal load

Expected Failure Mode: At 10,000+ connections on a single event loop, select() latency degrades (O(N) scan of ready keys). The next lesson covers the Multi-Reactor pattern (one Selector per CPU core).

💬 Discuss this topic