Day 50 of 60 · Data, ML & infrastructure

Data quality testing

Pipelines fail silently. Dashboards lie. Decisions go wrong. Declarative expectations on tables and columns gate the writes, before the lie reaches the boardroom.

ProblemPipelines silently break, null counts spike, schemas drift, freshness lapses, downstream dashboards lie.

How it works

Declarative expectations on tables, columns, and metrics. Run on every pipeline execution. Block writes / page on-call when expectations fail.

What it catches

Schema drift, null spikes, freshness violations, distribution shifts in source data, broken joins, silent pipeline failures. Necessary if business decisions consume the data.

Tools

Great Expectations · OSS dbt tests · OSS Soda Core · OSS Monte Carlo · SaaS

Verdict by project size

Small
Skip
Medium
Rec
Large
Must
Extra-large
Must

Cost

Project size Setup Maint / mo Tool / mo CI / run
Small <10k LOC 4h 1h $0 +1m
Medium 10–100k LOC 2d 5h $0 +3m
Large 100k–1M LOC 10d 25h $1k +8m
Extra-large >1M LOC 30d 80h $10k +15m
Setup = engineer-days to first useful run · Maint = engineer-hours / month at steady state · Tool = out-of-pocket $ / month · CI = minutes added (or saved) per pipeline run

Lifecycle & ownership

When in lifecycle
Build Operate
Per pull request · Runs in CI on every PR; gates merge.
Who owns it
Data Engineer
Pipelines, schemas, lineage
Collaborates with: SRE / DevOps / Platform, Security / AppSec

Reference implementations

Quick check

Data quality testing is essential when…

One question. Pick the best answer. Your streak is saved locally on this device.

Save the lesson

Download SVG ↓

Screenshot for a 1:1, drop it in Slack, or download the SVG.

thinkbridge THE VALIDATION ATLAS DAY 50 OF 60 DATA, ML & INFRASTRUCTURE Data qualitytesting Pipelines fail silently. Dashboards lie. Decisions go wrong.Declarative expectations on tables and columns gate thewrites, before the lie reaches the boardroom. FIVE-MINUTE LESSON · ONE QUICK-CHECK QUESTION There’s a new way there
All 60 days →