Day 2: The Write Path – Durable Tweet Storage and Atomic Operations

Lesson 2 60 min

Day 2: The Write Path – Durable Tweet Storage and Atomic Operations

Welcome back, future architects! Yesterday, we laid the groundwork for what a tweet is. Today, we tackle a far more critical question: how does a tweet survive?

In the world of hyperscale systems, data isn't just "stored"; it's persisted, guaranteed, and recoverable. When a user hits "Tweet," they expect that message to be there, forever, for billions of people to see. This isn't magic; it's meticulous engineering.

Today, we're diving deep into the write path: the journey a tweet takes from your client application to durable storage, focusing on the bedrock principles of durability and atomic operations. Forget FileOutputStream.write() and hoping for the best. We're going to build a system that guarantees your data survives even a sudden power outage.

The Unseen Battle: Data Loss and the "Write Path"

Imagine Twitter, or any social platform, handling 100 million requests per second. Each request isn't just a read; many are writes – new tweets, retweets, likes, DMs. If a server crashes mid-write, what happens? Is the data gone? Is it partially written, leading to corruption? This is the battle against data loss, and it's fought on the write path.

The core challenge is ensuring that a "write" operation is atomic – it either fully completes or entirely fails, with no in-between state. And once completed, it must be durable – meaning it survives system failures like crashes or power loss.

Flowchart

Tweet Request Serialize Tweet Append to WAL fsync() Update Memory Store

Core Concept: Write-Ahead Log (WAL) – The Unsung Hero

Every robust database system, from PostgreSQL to Cassandra to your favorite distributed key-value store, relies on a mechanism called a Write-Ahead Log (WAL) (sometimes called a journal). It's the secret sauce for durability and atomicity.

How it works (the fundamental insight):

Instead of directly modifying your primary data store (which might be complex, involve multiple disk seeks, and thus be slow and non-atomic), you first record the intention to change in a simple, append-only log file – the WAL.

  1. Log First: When a write request comes in (e.g., a new tweet), you serialize the entire operation (the tweet itself) and append it to the end of the WAL file.

  2. Force to Disk: Crucially, you then force this write to physical disk using a system call like fsync (or FileChannel.force() in Java). This ensures the data is truly on non-volatile storage, not just in the operating system's memory cache.

  3. Apply Later: Only after the WAL entry is safely on disk do you apply the change to your main data structure (e.g., an in-memory map, a B-tree, etc.).

  4. Acknowledge: Once the change is applied to the in-memory store, you acknowledge success to the client.

Why this is genius:

  • Durability: If the system crashes before applying to the main store but after fsync-ing the WAL, no problem! On restart, the system simply "replays" the WAL, reading all committed entries and applying them to rebuild its state. The data is never lost.

  • Atomicity: An operation is considered "committed" as soon as its entry is fsync-ed to the WAL. If a crash occurs, the WAL will either contain the full operation or nothing at all. There's no partial write.

  • Performance: Appending to a log file is sequential I/O, which is incredibly fast. Disk seeks, which are slow, are deferred to the "apply" step, which can happen asynchronously or in batches.

Our Hands-On Architecture: DurableTweetStore

Component Architecture

Client Tweet Server (Virtual Threads) Write Ahead Log (WAL - Persistent) TweetStore (ConcurrentHashMap) 1. Log First 2. Apply

Today, we'll build a simplified TweetStore that uses a WriteAheadLog to achieve durability and atomicity. Our TweetStore will maintain an in-memory ConcurrentHashMap for fast reads, but all writes will first go through the WAL.

  1. Tweet Record: A simple immutable data structure for our tweets.

  2. WriteAheadLog Class:

  • Responsible for appending serialized Tweet objects to a designated WAL file.

  • Uses FileChannel to perform fsync for true durability.

  • Provides a replay() method to read all entries from the WAL file and reconstruct the state.

  1. TweetStore Class:

  • Holds the ConcurrentHashMap<String, Tweet> representing our current tweet collection.

  • On initialization, it replay()s the WAL to load any previously persisted tweets.

  • The storeTweet() method orchestrates the WAL write and then the in-memory update.

  1. TweetServer: A simple command-line server using Java's ServerSocket and Project Loom's Virtual Threads. Each incoming client connection (representing a tweet submission) will be handled by a lightweight virtual thread, allowing us to manage high concurrency without breaking a sweat, even with blocking I/O operations like fsync.

This setup provides a foundational understanding of how durable, atomic writes are implemented at a low level, mirroring the core mechanics of real-world databases.

Static Machine

Tweet Server Node Application Layer (TweetServer Logic) Write Ahead Log (WAL) Persistent: wal.log TweetStore In-Memory: ConcurrentHashMap 1. fsync() 2. Update

Sizing for Real-Time Production Systems

Our single WAL file is great for understanding, but at 100M requests/second, it quickly becomes a bottleneck.

  • Replication: In production, the WAL itself would be replicated across multiple nodes (e.g., 3-5 replicas) using consensus protocols like Raft or Paxos. A write is only acknowledged after a quorum of replicas have fsync-ed the WAL entry.

  • Sharding: The entire system would be sharded. Instead of one TweetStore, you'd have thousands, each responsible for a subset of tweets (e.g., based on tweet ID hash, user ID). Each shard would have its own replicated WAL.

  • Asynchronous Application: The application of WAL entries to the primary data store might happen asynchronously, potentially by a separate "compactor" or "flusher" process that writes to immutable data files (like SSTables in LSM-trees).

By understanding the single-node WAL, you unlock the ability to reason about these distributed complexities. The core principle remains: log first, fsync always, apply later.


Assignment: Make it Production-Ready (Almost)

Your mission, should you choose to accept it, is to enhance our TweetServer and TweetStore.

  1. Unique Tweet IDs: Modify Tweet to include a String id (e.g., a UUID). Ensure that TweetStore.storeTweet() prevents storing duplicate tweet IDs. If a tweet with the same ID already exists, it should be ignored or update the existing one (your choice, but document it).

  2. Read Path: Add a simple read capability to TweetServer. When a client connects and sends a "GET /tweet/{id}" command (or similar simple protocol), your server should retrieve and return the tweet content if it exists.

  3. Graceful Shutdown: Implement a basic mechanism for the TweetServer to shut down gracefully (e.g., listening for a "SHUTDOWN" command), ensuring all in-flight writes are completed and the WAL is properly closed.

This will give you a more complete picture of a simple, durable, and atomic service.


Solution Hints

  1. Unique IDs:

  • For Tweet record, add String id. Consider using UUID.randomUUID().toString() in the client or server.

  • In TweetStore.storeTweet(), use ConcurrentHashMap.putIfAbsent(id, tweet) to handle uniqueness and atomicity with the map.

  • When replaying the WAL, ensure you add tweets to the map using their unique ID.

  1. Read Path:

  • Add a new method getTweet(String id) to TweetStore that simply retrieves from the ConcurrentHashMap.

  • In TweetServer, parse client input to differentiate between "POST" (write) and "GET" (read) requests.

  • For "GET", extract the ID, call TweetStore.getTweet(), and send the serialized tweet back to the client.

  1. Graceful Shutdown:

  • Introduce a volatile boolean running = true; flag in TweetServer.

  • Modify the while(running) loop for accepting connections.

  • Add a separate thread or a simple command listener (e.g., on System.in) that, when it receives "SHUTDOWN", sets running = false and closes the ServerSocket.

  • Ensure TweetStore and WriteAheadLog have close() methods that FileChannel.close() and flush any buffers. Call these during shutdown.

Good luck, and remember: the best systems are built on solid foundations, one durable byte at a time.

Need help?