Day 39 of 60 · Production & continuous

Distributed tracing & observability

Past three services, debugging without traces is folklore. With them, you read the call graph like a story.

ProblemErrors in distributed systems are invisible without correlation.

How it works

Every request gets a trace ID. Every span emits structured data. A trace UI shows the whole call graph. Pairs with metrics + logs (the three pillars).

What it catches

Cross-service latency, dependency cascades, hot spots. Required for debugging anything past three services.

Tools

OpenTelemetry · OSS Jaeger · OSS Tempo · OSS Honeycomb · SaaS

Verdict by project size

Small

Opt

Medium

Rec

Large

Must

Extra-large

Must

Cost

Project size	Setup	Maint / mo	Tool / mo	CI / run
Small <10k LOC	1d	1h	$0	,
Medium 10–100k LOC	3d	5h	$200	,
Large 100k–1M LOC	15d	30h	$3k	,
Extra-large >1M LOC	60d	150h	$20k	,

Setup = engineer-days to first useful run · Maint = engineer-hours / month at steady state · Tool = out-of-pocket $ / month · CI = minutes added (or saved) per pipeline run

Lifecycle & ownership

When in lifecycle

Release Operate Observe

Continuous in prod · Always-on, observing real traffic.

Who owns it

SRE / DevOps / Platform

CI/CD, observability, reliability

Collaborates with: Developer

Reference implementations

OpenTelemetry Demo
Realistic microservice demo instrumented with traces, metrics, and logs.
Jaeger HotROD
Distributed tracing demo app with realistic service interactions.
OpenTelemetry Collector examples
Collector pipelines for exporting traces, metrics, and logs.

Quick check

Distributed tracing becomes effectively required past…

One question. Pick the best answer. Your streak is saved locally on this device.

Save the lesson

Download SVG ↓

Screenshot for a 1:1, drop it in Slack, or download the SVG.

All 60 days →