Implementation Deep Dive
The ByteBuffer Handshake Parser
##
The critical insight: we don’t need to convert bytes to Strings until we’re certain the handshake is valid. We work directly with the buffer:
record HandshakeResult(boolean complete, int wsKeyStart, int wsKeyEnd) {}
HandshakeResult parseHandshake(ByteBuffer buffer) {
buffer.flip(); // Switch to read mode
int position = buffer.position();
// Scan for "rnrn" (end of headers) without allocation
while (buffer.remaining() >= 4) {
if (buffer.get() == 'r' && buffer.get() == 'n'
&& buffer.get() == 'r' && buffer.get() == 'n') {
// Headers complete - now extract key
buffer.position(position);
return extractWebSocketKey(buffer);
}
}
buffer.position(position); // Reset for next read
return new HandshakeResult(false, -1, -1);
}
This approach has zero heap allocations in the fast path. The buffer is direct-allocated once per connection and reused throughout the connection lifetime.
Virtual Thread Crypto Offload
##
When we’ve located the key bytes, we don’t block the selector:
if (result.complete()) {
// Extract key bytes (still no String allocation)
byte[] keyBytes = new byte[result.wsKeyEnd() - result.wsKeyStart()];
buffer.get(keyBytes);
// Offload to virtual thread - returns immediately
virtualExecutor.submit(() -> {
String acceptKey = computeAcceptKey(keyBytes);
// Signal selector that response is ready
wakeupSelectorWithResponse(key, acceptKey);
});
}
Virtual threads handle thousands of concurrent SHA-1 computations without creating OS threads. The JVM maps them to a small carrier thread pool (typically ForkJoinPool.commonPool()).
The Selector Wakeup Pattern
##
After the virtual thread computes the accept key, it needs to tell the selector to write the response. But you can’t directly manipulate SelectionKey from another thread. The pattern:
// In virtual thread
ConcurrentLinkedQueue<PendingWrite> writeQueue = ...;
writeQueue.offer(new PendingWrite(key, responseBytes));
selector.wakeup(); // Interrupt select() call
// In selector thread loop
selector.select();
processPendingWrites(); // Drain queue, set OP_WRITE interest
This is the same pattern used in Netty’s event loop.
Production Readiness: Metrics That Matter
##
When this code runs at scale, monitor these JVM metrics (use JMX or jcmd):
Allocation Rate: Should be <100MB/sec even at 10k handshakes/sec. If higher, you’re allocating in hot paths. Use -XX:+PrintGCDetails to identify culprits.
Direct Buffer Usage: Track with ManagementFactory.getPlatformMXBeans(BufferPoolMXBean.class). Each connection needs 1 direct buffer (~8KB). At 100k connections, that’s 800MB off-heap. If this leaks (buffers not cleaned), you’ll OOM without heap pressure.
Virtual Thread Count: Use JFR (Java Flight Recorder) to track jdk.VirtualThreadPinned events. If virtual threads are pinning carrier threads (blocked on synchronized methods), you’ve lost the scalability benefit.
Selector Latency: Instrument the time between select() calls. Should be <1ms. If it’s 10ms+, you’re doing blocking work on the selector thread (crypto, logging, etc.).
Connection State Distribution: How many connections are stuck in AWAITING_HEADERS for >10 seconds? That’s a Slowloris attack. Implement a reaper thread that closes stale handshakes.
The difference between a toy app and a production Gateway is observability. When the 3AM page comes (”why are connections timing out?”), these metrics tell you whether it’s GC thrashing, direct buffer exhaustion, or a DDoS attack overwhelming the handshake queue.
Tomorrow we’ll implement frame parsing - reading the actual WebSocket binary protocol. But first, master the handshake. It’s the gatekeeper that determines whether your Gateway handles 10k connections or 10 million.
Day 1: WebSocket Handshake - Implementation Guide
Prerequisites
##
Ensure you have the following installed:
JDK 21 or newer: Verify with java -version
Maven 3.8+: Verify with mvn -version
Visual monitoring tool (optional but recommended):
* VisualVM: Download from https://visualvm.github.io/
* Or use jconsole (included with JDK)
Setup Instructions
1. Source Code
##
Run the setup script to create the complete project structure:
git clone https://github.com/sysdr/discord-flux/tree/main
This creates a directory flux-day1-handshake/ with the complete implementation.
2. Navigate to Project
##
cd flux-day1-handshake
3. Start the Gateway
##
./scripts/start.sh
You should see output like:
🔨 Compiling Flux Gateway...
🚀 Flux Gateway listening on port 9001
📊 Dashboard: http://localhost:8080/dashboard
📊 Dashboard server started on port 8080
4. Open the Dashboard
##
Navigate to http://localhost:8080/dashboard in your browser.
You’ll see a real-time monitoring interface showing:
Verification Steps
Test 1: Single Connection
##
Open a new terminal and use wscat (or write a simple client):
Install wscat if needed
npm install -g wscat
# Connect to gateway
wscat -c ws://localhost:9001/gateway
Watch the dashboard - you should see:
Active Connections: 1
Total Connections: 1
Completed Handshakes: 1
In the gateway terminal, you’ll see:
✅ Connection accepted: /127.0.0.1:xxxxx
🔌 WebSocket upgraded: /127.0.0.1:xxxxx
Test 2: Load Test (100 Connections)
##
Run the included load test:
./scripts/demo.sh
This spawns 50 virtual threads, each establishing a WebSocket connection. You should see output like:
🔥 Starting load test: 50 connections
✓ Connection 0 upgraded successfully
✓ Connection 1 upgraded successfully
...
✅ Load test complete!
Total connections: 50
Successful: 50
Failed: 0
Duration: 234ms
Rate: 213 handshakes/sec
Test 3: Monitor JVM Behavior (Critical Learning)
##
While the gateway is running, open VisualVM or jconsole:
jconsole
Select the FluxGateway process. Navigate to:
Memory Tab:
Watch the heap usage during the load test
Notice it stays relatively flat (minimal allocation)
Young GC frequency should be low (<1/sec even during burst)
Threads Tab:
You’ll see only ~10-15 platform threads (Selector + ForkJoin pool)
NOT 50+ threads despite 50 connections
This is the Virtual Thread magic
MBeans Tab:
Navigate to java.nio.BufferPool → direct
Watch MemoryUsed increase by ~8KB per connection (the direct ByteBuffer)
This is off-heap memory - not counted in heap
Understanding the Architecture
The Selector Pattern
##
Only ONE thread handles all I/O:
Selector Thread (Infinite Loop)
↓
selector.select() → Blocks until events
↓
Events: [ACCEPT, READ, READ, WRITE, ...]
↓
For each event: Dispatch to handler
The Virtual Thread Offload
##
When we read complete handshake headers:
Selector Thread Virtual Thread Pool
↓ ↓
Read headers complete Submit SHA-1 computation
↓ ↓
Continue polling Compute accept key
↓ ↓
...process other events... Write result to queue
↓ ↓
Wakeup from select() selector.wakeup()
↓
Process write queue
This prevents blocking the selector on CPU-bound crypto.
Common Issues & Solutions
Issue: “Address already in use”
##
Cause: Gateway still running from previous session.
Fix:
./scripts/cleanup.sh
Issue: Dashboard shows 0 connections but load test succeeds
##
Cause: Dashboard refresh rate vs test speed.
Fix: Increase load test connection count or add a delay:
mvn exec:java -Dexec.mainClass="com.flux.gateway.LoadTest" -Dexec.args="1000"
Issue: OutOfMemoryError: Direct buffer memory
##
Cause: Too many connections without JVM tuning.
Fix: Increase direct buffer limit:
In start.sh, add JVM flag:
-XX:MaxDirectMemorySize=1G
Homework Challenge
Task: Optimize ByteBuffer Allocation
##
Current Implementation: Each connection allocates a new 8KB direct buffer:
this.readBuffer = ByteBuffer.allocateDirect(8192);
Your Challenge: Implement a ByteBufferPool that:
Pre-allocates 1000 direct buffers on startup
Loans them to connections (thread-safe)
Returns them on connection close
Tracks utilization (pooled vs allocated)
Success Criteria:
Run load test with 5000 connections
Direct memory usage should cap at 8MB (1000 buffers × 8KB)
No new allocations after pool warmup
Bonus: Add a dashboard metric showing pool utilization.
Next Steps
##
Tomorrow (Day 2), we’ll implement WebSocket Frame Parsing:
Reading the binary frame format
Handling fragmented messages
Implementing masking/unmasking
Building a zero-copy frame buffer
The handshake gets you from HTTP → WebSocket. Frame parsing lets you actually send/receive messages.
Cleanup
##
When finished:
./scripts/cleanup.sh
This kills the gateway process and removes compiled artifacts.
Resources
##
RFC 6455 (WebSocket Protocol): https://datatracker.ietf.org/doc/html/rfc6455
JEP 444 (Virtual Threads): https://openjdk.org/jeps/444
Java NIO Guide: https://docs.oracle.com/en/java/javase/21/core/java-nio.html
For questions or issues, review the lesson article’s “Production Readiness” section for debugging metrics.