Day 56 of 60 · AI-system specific

Output guardrails (AI safety)

Evals catch what you can enumerate. Guardrails catch what you can't. Used in series with, never in place of, the eval suite.

ProblemModels produce off-policy, unsafe, or PII-leaking output that can't be caught by an offline eval.

How it works

Inline classifiers and rule engines that screen output before it returns to the user. Block, redact, or rewrite on policy violation. Used in series with, not in place of, evals.

What it catches

Prompt-injection success, jailbreak outputs, PII leakage, brand-safety violations, off-topic generation, toxic content.

Tools

Llama Guard · OSS NeMo Guardrails (NVIDIA) · OSS Guardrails AI · OSS Lakera Guard · SaaS

Verdict by project size

Small

Opt

Medium

Rec

Large

Must

Extra-large

Must

Cost

Project size	Setup	Maint / mo	Tool / mo	CI / run
Small <10k LOC	4h	1h	$0	+1m
Medium 10–100k LOC	2d	4h	$100	+2m
Large 100k–1M LOC	8d	20h	$1k	+5m
Extra-large >1M LOC	25d	80h	$10k	+10m

Setup = engineer-days to first useful run · Maint = engineer-hours / month at steady state · Tool = out-of-pocket $ / month · CI = minutes added (or saved) per pipeline run

Lifecycle & ownership

When in lifecycle

Test Operate Observe

Per merge · Runs after merge to main; nightly heavy jobs.

Who owns it

ML / AI Engineer

Models, evals, drift, guardrails

Collaborates with: Developer, Security / AppSec

Reference implementations

NeMo Guardrails examples
Runnable guardrail configurations for conversational AI systems.
Llama Guard recipes
Safety classification examples for prompt and response filtering.
Guardrails AI hub
Reusable validators for safety, structure, PII, and policy enforcement.

Quick check

Output guardrails are used in series with, not in place of, what?

One question. Pick the best answer. Your streak is saved locally on this device.

Save the lesson

Download SVG ↓

Screenshot for a 1:1, drop it in Slack, or download the SVG.

All 60 days →