AI security & red-teaming
Non-negotiable at an insurer. Guardrails first, evangelism second.
AI adds new attack surfaces on top of normal appsec. One PII-in-a-prompt incident can set a whole program back. Know the OWASP LLM risks, defend against prompt injection, and red-team your own systems before someone else does.
Key ideas
- 1
Learn the OWASP Top 10 for LLM Applications β prompt injection, sensitive-data disclosure, insecure output handling, supply-chain, excessive agency, and more.
- 2
Prompt injection (direct & indirect): untrusted text β including retrieved docs and tool outputs β can hijack the model. Never trust model output as a command.
- 3
Protect data: keep PII/secrets out of prompts, logs and training; enforce least-privilege on tools and retrieval; redact at the gateway.
- 4
Constrain 'agency': the more actions an agent can take, the bigger the blast radius β scope tools, require approval for irreversible actions.
- 5
Red-team proactively: try to break your own system, document findings, and turn them into standards. Partner with security as an ally, early.
Top risks to internalize
- Indirect prompt injection via RAG content or tool results.
- Sensitive data leakage through prompts, logs, or over-broad retrieval.
- Insecure output handling (model output used in SQL, shell, HTML β injection).
- Excessive agency / over-permissioned tools.
- Supply chain: untrusted models, datasets and plugins.
Run a red-team exercise
- Define targets (data exfiltration, unauthorized actions, jailbreak bypass of policy).
- Attack: injection payloads in inputs/docs, role-play jailbreaks, tool abuse.
- Record what worked; add guardrails + eval checks; re-test.
- Publish a remediation checklist as a reusable standard.
Watch
Do the work
0/5 Β· 0%Test yourself
What is indirect prompt injection?
27 chapters Β· progress saves automatically