Day 1: Beyond println: Instrumenting with Micrometer Observation API

Lesson 1 60 min

Day 1: Beyond `println`: Instrumenting with Micrometer Observation API

Welcome back, future troubleshooting gurus. You're here because you understand that the old ways of debugging in production—like scattering System.out.println statements or blindly sifting through gigabytes of logs—are not just inefficient; they're career-limiting in a world of complex, distributed systems. When your services are handling 100 million requests per second, a single println is a whisper lost in a hurricane, and a war room without proper instrumentation is just a room full of people staring at dashboards that don't tell the whole story.

Today, we're laying the foundational brick of our diagnostic toolkit: the Micrometer Observation API. This isn't just another metrics library; it's the Rosetta Stone for understanding what your code is actually doing in production, tying together the "what," "when," and "how" of every critical operation.

Why Your `println` is a Lie in Distributed Systems

Think about it: in a monolithic application, a println might give you a fleeting glimpse. But in a distributed system, a single user request could traverse a dozen microservices, asynchronous queues, databases, and external APIs. If one service prints "Starting processing" and another prints "Finished processing," how do you connect those two dots? How do you know if they're even related to the same request?

The answer is: you can't, not reliably. This is where blind spots emerge, where latency spikes become mysterious, and where cascading failures turn into all-hands-on-deck nightmares. Our goal is to eliminate these blind spots, giving you X-ray vision into your system's internal workings.

Core Concept: The Power of Intent – Micrometer Observation API

At its heart, the Micrometer Observation API is about defining an "operation" or "unit of work" within your application. It's a high-level abstraction that captures the intent behind a piece of code execution. Instead of just logging a message or incrementing a counter, an Observation wraps the entire lifecycle of an activity, from start to finish, including any errors.

System Design Concept: Unified Observability Primitives.
Traditional observability often treats metrics, traces, and logs as separate concerns, instrumented independently. Micrometer Observation API challenges this by providing a single, unified API that, when instrumented, can simultaneously produce:

Metrics: How many times did this operation run? What was its duration?
Traces: What was the parent operation? What child operations did it invoke? What was the full path of the request?
Logs: Contextual information about the operation, automatically enriched with trace and span IDs.

This unification is critical. It means you instrument once and get comprehensive data for all three pillars, seamlessly linked. This drastically reduces instrumentation overhead and ensures consistency across your observability data.

Architecture: Fitting into Your System

Component Architecture

Imagine your application as a bustling factory. Each service is a workshop, and each method call is a task. The Micrometer Observation API acts like a highly efficient, ubiquitous foreman who observes every critical task.

Application: Your Java service (e.g., a Spring Boot app).
Micrometer Observation API: Embedded directly within your application code. This is where you define your observations.
Observation Handlers: These are the actual components that process the observations. One handler might send data to Prometheus (for metrics), another to Zipkin (for traces), and another might enrich your logs. The beauty is that the core Observation API doesn't care where the data goes, only that it's captured consistently.

This setup ensures that every critical operation within your service is not a black box but a transparent, measurable, and traceable unit of work.

Flowchart

Control Flow & Data Flow: The Observation Lifecycle

An observation isn't just a point-in-time event; it has a lifecycle:

Request Enters: A user request hits your API endpoint.
Start Observation: Your code explicitly or implicitly (@Observed annotation) tells Micrometer: "Hey, I'm starting a new operation!" A unique ID (e.g., trace ID, span ID) is generated or propagated.
Execute Business Logic: Your application performs its work (e.g., calls a database, invokes another service). During this time, the observation is RUNNING. Any nested operations started within this context automatically become children of the current observation. This is where context propagation shines, effortlessly linking related operations.
Error Handling (Optional): If something goes wrong, you record the error with the observation.
Stop Observation: The operation completes (successfully or with an error). You tell Micrometer: "This operation is done."
Process Observation: Micrometer's registered ObservationHandlers spring into action, taking the collected data (name, duration, tags, error status, IDs) and sending it off to the relevant observability backends.

Hands-on: Building Your First Instrumented Service

State Machine

We'll create a simple Spring Boot application and instrument a method using the Micrometer Observation API. For now, we'll use the LoggingObservationHandler to see the output directly in our console, giving us immediate feedback on what an observation looks like.

First, your pom.xml needs the micrometer-observation and micrometer-observation-core dependencies (Spring Boot will bring in a lot of Micrometer stuff, but these ensure the core API is there).

xml

<!-- In your pom.xml, inside <dependencies> -->
<dependency>
    <groupId>io.micrometer</groupId>
    <artifactId>micrometer-observation</artifactId>
</dependency>
<dependency>
    <groupId>io.micrometer</groupId>
    <artifactId>micrometer-observation-core</artifactId>
</dependency>
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-actuator</artifactId>
</dependency>

Now, let's write some code. We'll set up a simple service and a REST controller.

java

// src/main/java/com/example/observation/ObservationApplication.java
package com.example.observation;

import io.micrometer.observation.ObservationRegistry;
import io.micrometer.observation.aop.ObservedAspect;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.context.annotation.Bean;

@SpringBootApplication
public class ObservationApplication {

    public static void main(String[] args) {
        SpringApplication.run(ObservationApplication.class, args);
    }

    // This bean is necessary to enable @Observed annotation processing
    @Bean
    public ObservedAspect observedAspect(ObservationRegistry observationRegistry) {
        return new ObservedAspect(observationRegistry);
    }
}

java

// src/main/java/com/example/observation/MyService.java
package com.example.observation;

import io.micrometer.observation.Observation;
import io.micrometer.observation.ObservationRegistry;
import io.micrometer.observation.annotation.Observed;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.stereotype.Service;

import java.util.Random;

@Service
public class MyService {

    private static final Logger log = LoggerFactory.getLogger(MyService.class);
    private final ObservationRegistry observationRegistry;
    private final Random random = new Random();

    public MyService(ObservationRegistry observationRegistry) {
        this.observationRegistry = observationRegistry;
    }

    // Option 1: Using @Observed annotation (simpler for common cases)
    @Observed(name = "my.service.process.data", contextualName = "processing-data",
              extraKeyValues = {"data.type", "critical"})
    public String processData(String input) throws InterruptedException {
        log.info("Processing data: {}", input);
        Thread.sleep(random.nextInt(100) + 50); // Simulate work
        if (input.contains("fail")) {
            throw new RuntimeException("Simulated processing failure for: " + input);
        }
        return "Processed: " + input;
    }

    // Option 2: Manual Observation API usage (for fine-grained control)
    public String performComplexOperation(String taskId) throws InterruptedException {
        // Start an observation
        Observation observation = Observation.createNotStarted("my.service.complex.operation", this.observationRegistry)
                .lowCardinalityKeyValue("task.id", taskId) // Low cardinality for metrics
                .highCardinalityKeyValue("task.details", "Detailed info for " + taskId + " at " + System.currentTimeMillis()) // High cardinality for traces
                .contextualName("complex-op-" + taskId)
                .start(); // This sets the observation as the current one in the thread context

        try (Observation.Scope scope = observation.openScope()) { // Ensures observation is closed even on exceptions
            log.info("Starting complex operation for task: {}", taskId);
            Thread.sleep(random.nextInt(200) + 100); // Simulate more work

            if (taskId.contains("error")) {
                throw new IllegalStateException("Simulated complex operation error for: " + taskId);
            }

            String result = "Complex operation completed for: " + taskId;
            log.info(result);
            return result;
        } catch (Exception e) {
            observation.error(e); // Record the error
            throw e; // Re-throw to propagate
        } finally {
            observation.stop(); // Always stop the observation
        }
    }
}

java

// src/main/java/com/example/observation/MyController.java
package com.example.observation;

import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.PathVariable;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;

@RestController
@RequestMapping("/observe")
public class MyController {

    private final MyService myService;

    public MyController(MyService myService) {
        this.myService = myService;
    }

    @GetMapping("/data/{input}")
    public String getData(@PathVariable String input) throws InterruptedException {
        return myService.processData(input);
    }

    @GetMapping("/complex/{taskId}")
    public String getComplex(@PathVariable String taskId) throws InterruptedException {
        return myService.performComplexOperation(taskId);
    }
}

When you run this and hit the endpoints (e.g., http://localhost:8080/observe/data/hello or http://localhost:8080/observe/complex/task123), you'll see console output from LoggingObservationHandler showing the start and stop of observations, including their names, durations, and tags. This is your first peek "under the hood."

Size in Real-Time Production Systems (100M RPS)

For systems handling extreme scale, like 100 million requests per second, instrumenting everything is not feasible or desirable due to overhead (CPU, memory, network, storage). Here's the pragmatic approach:

Focus on Critical Paths: Instrument business-critical operations, external API calls, database interactions, message queue send/receive, and key internal components.
Automatic vs. Manual Instrumentation: Leverage automatic instrumentation (e.g., Spring Boot's @Observed, auto-configuration for common libraries) for boilerplate, and use manual Observation API for custom business logic where fine-grained control and specific tags are essential.
Tagging Discipline:
Low Cardinality Tags: Use these for dimensions that have a limited, known set of values (e.g., status: success/failure, http.method: GET/POST, service.name: userservice). These are excellent for metrics aggregation.
High Cardinality Tags: Use these sparingly and judiciously (e.g., user.id: 12345, order.id: ABC-XYZ). While valuable for tracing specific requests, they can explode your metrics database size if not managed (e.g., via sampling).
Sampling: This is a non-negotiable for high-scale tracing. You cannot send every trace. Implement intelligent sampling strategies (e.g., head-based sampling, tail-based sampling) to capture a representative subset of traces, especially those showing errors or high latency. (We'll dive deeper into this in future lessons).

The Observation API is designed with this flexibility in mind, allowing you to define the intent of an operation once, and let different handlers decide how to process and sample that data for metrics, traces, and logs.

Assignment: Deepening Your Diagnostic Toolkit

Your mission, should you choose to accept it, is to extend our simple application:

Introduce a New Service: Create a new Spring @Service class, e.g., OrderProcessorService.
Instrument a New Method: Add a method processOrder(String orderId) in OrderProcessorService.

Use the manual Observation API (like performComplexOperation) to instrument this method.
Name the observation appropriately (e.g., order.processing).
Add a low cardinality tag for order.type (e.g., "digital", "physical").
Add a high cardinality tag for order.id (using the orderId parameter).
Simulate a delay using Thread.sleep().
Introduce a conditional error path: if orderId contains "urgent-fail", throw an IllegalArgumentException and ensure the observation records this error.

Create a New Endpoint: Add a new @GetMapping endpoint in MyController (e.g., /order/{orderId}) that calls your new OrderProcessorService.processOrder() method.
Verify: Run the application and hit your new endpoint with both success and failure cases. Observe the console output from LoggingObservationHandler to confirm your observations are being created, tagged, and correctly record errors.

This exercise will solidify your understanding of how to use both declarative (@Observed) and programmatic (Observation.createNotStarted()) instrumentation, giving you the flexibility needed for real-world scenarios.

Solution Hints

Remember to inject ObservationRegistry into your new service's constructor.
The try-with-resources block with observation.openScope() is your best friend for ensuring observations are properly closed, even when exceptions occur.
Don't forget to call observation.error(e) before observation.stop() if an exception occurs within the try block. The finally block will handle the stop() call.
When adding tags, think about whether the value set is finite (low cardinality) or potentially infinite (high cardinality). order.type is low, order.id is high.

This is your first step into building truly observable systems. Mastering this foundation will empower you to debug faster, understand system behavior more deeply, and ultimately, build more resilient software. Next, we'll see how these observations transform into powerful distributed traces.

Learning Objectives

✓ By the end of this module, you will be able to:
✓ Contrast traditional logging with unified observability (Metrics, Traces, Logs).
✓ Implement both declarative (@Observed) and programmatic (Observation API) instrumentation in a Spring Boot application.
✓ Distinguish between Low Cardinality and High Cardinality tags to optimize monitoring performance at scale.
✓ Manage the Observation lifecycle, including starting, stopping, and recording errors within a functional unit of work.

💬 Discuss this topic