← marwandiallo.comlabs

Prompt Injection Lab

The XSS of the LLM era. An attacker plants instructions inside data an agent ingests — a webpage, a ticket, a resume, a README — and the agent treats those instructions as if they came from its operator. The defenses look a lot like the ones we already use for HTML.

Zero LLM calls. This lab is a deterministic simulator. It demonstrates the attack patterns and defense primitives without you needing an API key, without me paying for tokens, and with results you can reproduce in a code review.

Read in order

  1. Simulator — see the attacks land (or not) on identical agents that differ only in their prompt structure.
  2. Patterns — the full catalog of what to look for in untrusted content.
  3. Defenses — what to actually ship.

Why this lab is backendless

The point isn't to run an LLM. It's to teach the failure mode. A deterministic simulator beats a flaky live demo because the lesson is reproducible: every visitor sees the same attack land the same way. If you want to stress-test a real agent against these patterns, the corpus in lib/prompt-injection.ts is a starting point — paste each sample into your model of choice and compare its behavior to the hardened simulator.

Pairs with

The on-behalf-of confusion sample (#6 in the simulator) is the same gap covered in the Agent identity page. An injected instruction is only catastrophic if the agent has the privilege to execute it. Workload identity, scoped delegation, and per-action confirmation are how you contain the blast radius.