Benchmarking Self-Hosted LLMs for Offensive Security

ID: 86061c7f-a5eb-590d-8a61-55060413097b

STIX ID: report--86061c7f-a5eb-590d-8a61-55060413097b

Threat Score

Date Published: 2026-04-14

Date Updated: 2026-05-01

...

## Executive summary This report benchmarks six locally hosted LLMs against eight exploitation challenges on OWASP Juice Shop using only an HTTP request and an encoding tool: models reliably solved single-step exploits (SQLi auth bypass, JWT alg:none, simple LFI, IDOR) with high pass rates, but failed on complex multi-step extraction, algorithm-confusion JWT attacks, and challenges where the tool/harness could not express required protocols; the author concludes that tooling, success-check design, and memory/context management are the main limiting factors rather than a complete lack of model offensive knowledge.

Your team is not currently subscribed to this feed. You must subscribe to it in order to see this post.