AI Red Team & Security
Attack. Assess. Patch.
Systematically test your LLMs and AI agents against adversarial attacks before attackers find them. Closed-loop security testing: Attack → Detect → Defend → Verify.
Each head of the Hydra represents an agent in the closed-loop security testing pipeline.
Generates adversarial prompts from 165+ curated attack patterns including jailbreaks, prompt injections, and extraction attacks.
Scores responses 0-10 for safety, detects secret leakage, and determines if a breach occurred with detailed analysis.
Auto-generates hardened system prompts when breaches occur, implementing specific countermeasures against detected attack vectors.
Re-tests defenses to prove they work, closing the loop. Only marks as secure when the same attack no longer succeeds.
Get up and running in under 5 minutes.
# Clone and install
git clone https://github.com/sherifkozman/the-red-council
cd the-red-council && pip install -e ".[dev]"
# Seed the attack knowledge base
python -m scripts.seed_kb
# Start the backend
uvicorn src.api.main:app --port 8000
# In another terminal, start the frontend
cd frontend && pnpm install && pnpm dev
# Visit http://localhost:3000