Day 56 of 60
·
AI-system specific
Output guardrails (AI safety)
Evals catch what you can enumerate. Guardrails catch what you can't. Used in series with, never in place of, the eval suite.
ProblemModels produce off-policy, unsafe, or PII-leaking output that can't be caught by an offline eval.
How it works
Inline classifiers and rule engines that screen output before it returns to the user. Block, redact, or rewrite on policy violation. Used in series with, not in place of, evals.
What it catches
Prompt-injection success, jailbreak outputs, PII leakage, brand-safety violations, off-topic generation, toxic content.
Tools
Llama Guard · OSS NeMo Guardrails (NVIDIA) · OSS Guardrails AI · OSS Lakera Guard · SaaS
Verdict by project size
Small
Opt
Medium
Rec
Large
Must
Extra-large
Must
Cost
| Project size | Setup | Maint / mo | Tool / mo | CI / run |
|---|---|---|---|---|
| Small <10k LOC | 4h | 1h | $0 | +1m |
| Medium 10–100k LOC | 2d | 4h | $100 | +2m |
| Large 100k–1M LOC | 8d | 20h | $1k | +5m |
| Extra-large >1M LOC | 25d | 80h | $10k | +10m |
Setup = engineer-days to first useful run ·
Maint = engineer-hours / month at steady state ·
Tool = out-of-pocket $ / month ·
CI = minutes added (or saved) per pipeline run
Lifecycle & ownership
When in lifecycle
Test Operate Observe
Per merge · Runs after merge to main; nightly heavy jobs.
Who owns it
ML / AI Engineer
Models, evals, drift, guardrails
Collaborates with: Developer, Security / AppSec
Reference implementations
-
NeMo Guardrails examples
Runnable guardrail configurations for conversational AI systems.
-
Llama Guard recipes
Safety classification examples for prompt and response filtering.
-
Guardrails AI hub
Reusable validators for safety, structure, PII, and policy enforcement.
Quick check
Output guardrails are used in series with, not in place of, what?
One question. Pick the best answer. Your streak is saved locally on this device.
Save the lesson
Download SVG ↓Screenshot for a 1:1, drop it in Slack, or download the SVG.