Day 3: Structured Logging with MDC Context Propagation – The Digital Thread in Your Distributed Fabric
Welcome back, fellow architects of resilience.
Yesterday, we laid the groundwork for understanding the journey of a request across services with Distributed Tracing. Today, we’re going to weave an even finer thread into that fabric: Structured Logging with MDC Context Propagation. This isn't just about dumping text; it's about crafting a digital narrative for every single interaction in your system, a narrative that tells a complete story, not just fragmented sentences.
In ultra-high-scale systems, where a single user request might touch dozens, if not hundreds, of microservices, traditional logging quickly becomes a liability. Imagine trying to debug why a payment failed when you have millions of logs per second across thousands of containers. Without a coherent thread connecting all these disparate log lines back to that one specific payment request, you're not debugging; you're playing a frustrating game of "Where's Waldo?" in a digital hayfield.
This is where MDC (Mapped Diagnostic Context) steps in. It's not just a nice-to-have; it's a critical enabler for effective troubleshooting in production.
Why Your Logs Need a Digital Thread
Think about how you usually log: log.info("Processing order for user: " + userId);. This is fine for a monolithic application. But in a distributed system, that userId might be processed by Service A, then Service B, then Service C, all on different machines, potentially handled by different threads. How do you tie all these log lines together?
The answer is context propagation. We need to carry a unique identifier (like a correlationId or traceId) through the entire request lifecycle, from the moment it hits your API Gateway until it completes its journey. MDC allows us to attach this context to every log statement made by the current thread, automatically.
Core Concepts: MDC and Structured Logging
Mapped Diagnostic Context (MDC):
MDC is a thread-local map managed by logging frameworks (like Logback or Log4j2).
You put key-value pairs into MDC, and these pairs are then automatically included in all log messages emitted by that thread.
The "Aha!" Moment: When a request comes in, a unique
correlationIdis generated or extracted. This ID is then placed into MDC. As the request flows through your service, any log statement made by the handling thread will automatically include thiscorrelationId. When this service calls another downstream service, thecorrelationIdis passed in the HTTP headers. The downstream service then extracts it and places it into its MDC, continuing the chain.The Nuance: MDC is
ThreadLocal. This is crucial. If your application uses asynchronous processing (e.g.,CompletableFuture,ExecutorService, reactive frameworks like Project Reactor), the context does not automatically flow to new threads. This is a common pitfall. You need explicit mechanisms (likeMDC.wrap(),ContextSnapshot, or customContextoperators) to propagate MDC across thread boundaries. This is where the real production-grade difference lies.
Structured Logging:
Instead of plain text logs, structured logging outputs logs in a machine-readable format, typically JSON.
Each log entry becomes a structured object with fields like
timestamp,level,message,serviceName, and crucially, our MDC-propagatedcorrelationId.Why it matters for 100M RPS: When you're dealing with immense log volumes, you can't
grepyour way to a solution. You need to query, filter, and aggregate logs efficiently using tools like Elasticsearch, Splunk, or Datadog. Structured logs make this trivial. You can ask: "Show me all logs forcorrelationId='abc-123'across all services, wherelevel='ERROR'." This transforms debugging from a manual hunt to a targeted query.
Architecture & Control Flow
Imagine a user request hitting your system:
API Gateway / Edge Service:
A
HandlerInterceptor(Spring) orFilter(Servlet API) intercepts the incoming request.It generates a unique
correlationId(if not present in headers) or extracts an existing one.It places this
correlationIdinto the current thread's MDC (MDC.put("correlationId", id)).It adds the
correlationIdto outgoing HTTP headers for any downstream calls.Crucially, it ensures
MDC.clear()is called when the request completes to prevent context leaking to other requests handled by the same recycled thread (the "sticky context" problem).
Downstream Microservices (e.g., Order Service, Inventory Service):
Similarly, a
HandlerInterceptororFilterintercepts the incoming request.It extracts the
correlationIdfrom the incoming HTTP headers.It places this
correlationIdinto its own thread's MDC.It also adds the
correlationIdto outgoing HTTP headers for its downstream calls.Again,
MDC.clear()on request completion.
Data Flow: The correlationId starts as a unique string, flows through HTTP headers between services, and lives temporarily in the MDC of each processing thread, getting injected into every log line.
Real-World Application: The Netflix Debugging Story
Consider Netflix. When you click "Play" on a movie, that single action triggers a cascade of events across services: authentication, authorization, content delivery network (CDN) routing, billing checks, recommendation updates, analytics tracking, and more. If your video buffers, how do their SREs pinpoint the bottleneck? They don't just look at a single service's logs. They use a requestId (their version of correlationId) that is propagated across every service. With structured logs and a powerful log aggregation system (like their internal variant of ELK), they can query for all events related to your specific "Play" request, instantly seeing its journey, latency at each hop, and any errors, regardless of which of their thousands of services generated the log. This is the power you're building.
Assignment: Weaving the Digital Thread
You will build a simple distributed system consisting of two Spring Boot services: a GatewayService and a BackendService. Your goal is to implement structured logging with MDC context propagation, ensuring a correlationId is present in all log lines across both services for a given request.
Steps:
Project Setup: Create two Spring Boot projects:
gateway-serviceandbackend-service.Logback Configuration: Configure both services to use Logback with JSON structured logging, including a
correlationIdfield.Gateway Service:
Implement a
HandlerInterceptorto generate acorrelationIdfor incoming requests (if not present) and put it into MDC.Ensure the
correlationIdis added to outgoing HTTP headers when the Gateway calls the Backend. Use aRestTemplateinterceptor for this.Implement a simple
/greetendpoint that calls the Backend's/helloendpoint.
Backend Service:
Implement a
HandlerInterceptorto extract thecorrelationIdfrom incoming HTTP headers and put it into MDC.Implement a
/helloendpoint that returns a simple message.
Clean Up: Crucially, ensure MDC is cleared after each request in both services to prevent context leaks. This is often handled automatically by Spring's
HandlerInterceptorlifecycle (afterCompletion), but it's vital to understand why.Verification: Make requests to the Gateway and observe the structured logs in both services. Confirm the
correlationIdis identical across related log lines.
Solution Hints
Logback JSON: Use
net.logstash.logback:logstash-logback-encoderdependency. Inlogback-spring.xml, configure aJsonLayoutorLogstashEncoder.
To include MDC properties, the LogstashEncoder does this by default.
MDC Interceptor (Gateway/Backend):
RestTemplate Interceptor (Gateway):
WebMvcConfigurer (to register interceptors):
This setup provides a robust foundation for observability. You're not just logging; you're creating a forensic trail that can cut through the noise of a high-volume distributed system, allowing you to quickly diagnose and resolve issues. This is how the pros do it.