Day 3: The Loop: Building the main event loop for a single client connection.

Lesson 3 60 min

Building Discord: From Socket to Scale

Day 3: The Event Loop - Understanding How Servers Handle Thousands of Connections


What You'll Learn Today

By the end of this lesson, you'll understand how Discord, WhatsApp, and other real-time platforms handle millions of simultaneous connections without your computer exploding. You'll build a working server from scratch that can handle 10,000 concurrent connections using just a single thread.

Core Concepts:

  • Why traditional thread-per-connection servers fail at scale

  • How the Reactor pattern multiplexes I/O operations

  • Managing ByteBuffers and memory without triggering garbage collection

  • Building a complete event loop in Java 21


Part 1: Understanding The Problem

Component Architecture

Component Architecture: The Event Loop EventLoop (Single Thread) Selector (Multiplexer) select() loop accept() read() / write() timeout checks Socket Client #1 Socket Client #2 Socket Client #3 ... Connection • readBuffer • writeQueue • state • timestamp Connection • readBuffer • writeQueue • state • timestamp ProtocolHandler Parse frames: [length][payload] Handle handshake & message echo "One thread monitors thousands of sockets" "Direct ByteBuffers avoid heap allocation" Key Insight: The Selector blocks on select() until ANY channel has I/O ready. No polling, no wasted CPU cycles. Each Connection holds pre-allocated buffers. Zero GC pressure during steady-state operation.

The Spring Boot Trap

Most beginners building a chat server will reach for something like Spring's @EnableWebSocket annotation:

java
@ServerEndpoint("/chat")
public class ChatEndpoint {
    @OnMessage
    public void handleMessage(String message) {
        // Magic happens here... or does it?
    }
}

This works great for 100 users. At 10,000 connections, you'll see:

  • Your server's memory usage explodes

  • Garbage collection pauses exceeding 500 milliseconds

  • Thread pools that can't keep up with incoming requests

The framework hides what actually happens when a client sends data. When your production server crashes at 3 AM during a livestream event, you can't fix what you don't understand.

The Real Problem: Thread-Per-Connection Death

Here's the fundamental rule: You cannot create an operating system thread for every client connection at scale.

Why not?

Memory Overhead: Each Java thread has a default stack size of 1MB. If you have 50,000 connections, that's 50GB of RAM before you've handled a single message.

Context Switching: Your computer has maybe 8 CPU cores. When 50,000 threads compete for those cores, the operating system spends all its time switching between threads instead of actually processing data.

Garbage Collection Pressure: Thread objects themselves live in memory. Creating and destroying threads during connection spikes triggers full garbage collection cycles, freezing your entire application.

Even Java's new Virtual Threads don't solve this if you use them carelessly. You still need discipline around buffer management and back-pressure.


Part 2: The Solution - The Reactor Pattern

How Discord Actually Works

Flowchart

Sequence Flow: Client Connection Lifecycle Client Selector (EventLoop) Protocol Handler Connection TCP connect() create Connection state: HANDSHAKE FLUX_HELLO parse frame state = READY queue write FLUX_READY Message Exchange Loop (READY state) message data ECHO: message FIN (close connection) cleanup buffers Step 1 Step 2 Step 3 Step 4 Step 5 Non-blocking I/O: Selector only wakes up when SocketChannel has data ready. Zero CPU waste polling empty sockets.

Discord doesn't create a thread per user. Neither does WhatsApp or LinkedIn Realtime. They use non-blocking I/O multiplexing: one thread monitoring thousands of sockets, only doing work when data actually arrives.

The Four Core Components

1. The Selector (The Multiplexer)

Java's Selector is a wrapper around operating system primitives (epoll on Linux, kqueue on macOS, IOCP on Windows). You register many socket connections with a single Selector, then call select() which blocks until at least one socket has data ready.

java
Selector selector = Selector.open();
channel.register(selector, SelectionKey.OP_READ);
while (running) {
    selector.select(); // Blocks until activity
    Set<SelectionKey> keys = selector.selectedKeys();
    // Process ready channels
}

2. Non-Blocking Channels

ServerSocketChannel and SocketChannel in non-blocking mode return immediately from I/O operations. If no data is available, read() returns 0 instead of parking the thread.

3. Direct ByteBuffers

Allocating ByteBuffer.allocateDirect() creates native memory outside the Java heap. This avoids:

  • Garbage collection scanning these buffers

  • Copying data from native socket buffers to JVM heap and back during writes

Trade-off: Direct buffers are slower to allocate. We solve this by pre-allocating a pool.

4. State Machine Per Connection

Each connection tracks:

  • Current state: HANDSHAKE, READY, CLOSING

  • Read buffer position (for parsing partial frames)

  • Write queue (messages waiting to be sent)

The Event Loop Lifecycle

Code
1. Server binds to port 9090
2. Register ServerSocketChannel with Selector (OP_ACCEPT)
3. Loop forever:
   a. selector.select() - blocks until events
   b. For each SelectionKey:
      - OP_ACCEPT: Create new SocketChannel, register for OP_READ
      - OP_READ: Read bytes into buffer, parse protocol frames
      - OP_WRITE: Flush pending data from write queue
   c. Check connection timeouts/heartbeats

Part 3: The Hard Parts - ByteBuffer Management

Understanding ByteBuffer Position and Limit

This is where most implementations break. ByteBuffer has position and limit pointers that you must manually manage:

java
ByteBuffer buffer = ByteBuffer.allocateDirect(8192);

// Reading from socket
int bytesRead = channel.read(buffer); // Advances position

// Parsing requires flipping
buffer.flip(); // limit = position; position = 0
while (buffer.remaining() >= 4) {
    int messageLength = buffer.getInt(); // Reads 4 bytes
    // Parse message...
}

// Compact unread bytes to start of buffer
buffer.compact(); // Copies remaining bytes to position 0

Common Bugs:

  • Forgetting compact() causes old data to be re-processed

  • Forgetting flip() means parsing reads garbage beyond the limit

Handling Partial Frames

TCP is a byte stream, not a message stream. A single read() might return:

  • Half a message

  • Three and a half messages

  • Zero bytes (socket has no data yet)

Our protocol uses: [4-byte length][payload bytes]

java
if (readBuffer.remaining() < 4) {
    return; // Need more data for length header
}
int length = readBuffer.getInt(readBuffer.position());
if (readBuffer.remaining() < 4 + length) {
    return; // Need more data for full payload
}
// Now we can safely parse the complete frame
readBuffer.getInt(); // Consume length header
byte[] payload = new byte[length];
readBuffer.get(payload);

Zero-Copy Writes with Scatter/Gather

Instead of copying message data into a single ByteBuffer, use SocketChannel.write(ByteBuffer[]) to write multiple buffers in one system call:

java
ByteBuffer header = ByteBuffer.allocate(4).putInt(payload.length).flip();
ByteBuffer body = ByteBuffer.wrap(payload);
channel.write(new ByteBuffer[]{header, body});

This avoids intermediate allocations and uses operating system vectored I/O.


Part 4: Connection State Machine

State Machine

Connection State Machine START accept() CONNECTING (TCP Accepted) register OP_READ HANDSHAKE (Awaiting Hello) FLUX_HELLO READY (Active Messaging) FIN received Timeout/Invalid CLOSING (Flushing Buffers) buffers flushed CLOSED State Details CONNECTING • Socket accepted / Buffers set HANDSHAKE • 30s timeout enforcement • Validates protocol version READY • Steady state messaging • Process I/O loop CLOSING • Completes pending writes • Safe socket shutdown Insight: The system prioritizes data integrity by ensuring the CLOSING state flushes all buffers before moving to CLOSED.

Every connection moves through these states:

CONNECTING - TCP socket accepted, buffers allocated
HANDSHAKE - Waiting for client to send "FLUX_HELLO"
READY - Active messaging, echoing data back to client
CLOSING - Flushing write queue before closing socket
CLOSED - Connection terminated, resources freed

The critical path is HANDSHAKE to READY. All other paths lead to CLOSING (failure or timeout).


Part 5: Production Readiness

Metrics That Matter

When running in production, monitor these:

Selector Latency: Time spent in select() call

  • Normal: Less than 1ms

  • Problem: More than 10ms (too many channels)

Read Buffer Allocation Rate: Should be approximately 0 after warmup (reusing pool)

  • High rate means garbage collection thrashing

Connection State Distribution: How many in HANDSHAKE vs READY?

  • Spike in CLOSING indicates attack or bug

Thread Context Switches: Should be single-digit per second for event loop thread

Visual Verification with VisualVM

Open VisualVM and attach to the Gateway process to watch:

Heap Usage: Should be flat after initial allocation. Sawtooth pattern indicates memory leak.

Non-Heap (Direct Memory): Grows proportionally to connection count, then plateaus.

Thread Count: Must remain constant (1 event loop + 1 dashboard server). If growing, you're leaking threads.

Expected Performance

A properly implemented event loop should achieve:

  • 10,000 concurrent connections on single thread

  • Message latency p99 less than 5ms

  • CPU usage under 40% on one core

  • Heap usage stable after warmup

  • Zero OutOfMemoryError under normal load

Expected Failure Mode: At 10,000+ connections on a single event loop, select() latency degrades (O(N) scan of ready keys). The next lesson covers the Multi-Reactor pattern (one Selector per CPU core).


Need help?