Day 50 of 60
·
Data, ML & infrastructure
Data quality testing
Pipelines fail silently. Dashboards lie. Decisions go wrong. Declarative expectations on tables and columns gate the writes, before the lie reaches the boardroom.
ProblemPipelines silently break, null counts spike, schemas drift, freshness lapses, downstream dashboards lie.
How it works
Declarative expectations on tables, columns, and metrics. Run on every pipeline execution. Block writes / page on-call when expectations fail.
What it catches
Schema drift, null spikes, freshness violations, distribution shifts in source data, broken joins, silent pipeline failures. Necessary if business decisions consume the data.
Tools
Great Expectations · OSS dbt tests · OSS Soda Core · OSS Monte Carlo · SaaS
Verdict by project size
Small
Skip
Medium
Rec
Large
Must
Extra-large
Must
Cost
| Project size | Setup | Maint / mo | Tool / mo | CI / run |
|---|---|---|---|---|
| Small <10k LOC | 4h | 1h | $0 | +1m |
| Medium 10–100k LOC | 2d | 5h | $0 | +3m |
| Large 100k–1M LOC | 10d | 25h | $1k | +8m |
| Extra-large >1M LOC | 30d | 80h | $10k | +15m |
Setup = engineer-days to first useful run ·
Maint = engineer-hours / month at steady state ·
Tool = out-of-pocket $ / month ·
CI = minutes added (or saved) per pipeline run
Lifecycle & ownership
When in lifecycle
Build Operate
Per pull request · Runs in CI on every PR; gates merge.
Who owns it
Data Engineer
Pipelines, schemas, lineage
Collaborates with: SRE / DevOps / Platform, Security / AppSec
Reference implementations
-
Great Expectations introduction
Declarative data expectations for tables, columns, and pipelines.
-
dbt tests
Data tests embedded directly in transformation projects.
-
Soda Core examples
Data quality checks for schema, freshness, and distribution drift.
Quick check
Data quality testing is essential when…
One question. Pick the best answer. Your streak is saved locally on this device.
Save the lesson
Download SVG ↓Screenshot for a 1:1, drop it in Slack, or download the SVG.