2418 min read · 230 XP

Capstone: red-team & report

Break your own system, then standardize the fix.

Attack one of your AI features the way an adversary would, document what worked, and turn it into a reusable remediation checklist. This is how you earn security's trust and build a real guardrail standard.

Key ideas

1
Done means: a written report of attempted attacks, what succeeded, severity, and concrete remediations — plus a reusable checklist.
2
Cover the OWASP LLM basics: prompt injection (direct & indirect), data exfiltration, insecure output handling, excessive agency.
3
Test indirect injection via retrieved content / tool outputs, not just the user box.
4
Convert each finding into a guardrail and an eval check, then re-test.

Run the exercise

Define targets: data exfiltration, unauthorized actions, policy bypass.
Try direct & indirect prompt injection, jailbreaks, and tool abuse.
Attempt to leak system prompt / hidden data / another user's data.
Record severity and reproduction steps for each success.
Add guardrails + eval checks; re-test to confirm closure.

Write it up

Findings table: attack, result, severity, remediation, status.
A reusable AI red-team checklist for future features.
Share with security to co-own the standard.

Watch

▶How to red-team an LLM applicationFind it on YouTube →

Do the work

0/5 · 0%

Test yourself

Question 1 / 2

Beyond the user input box, where must you test for prompt injection?

27 chapters · progress saves automatically