Context-Aware AI Security Testing: 98% Noise Reduction

AI security often fails for a familiar reason: teams inherit thousands of findings but still don’t know which ones represent real risk. Scanning tools generate volume, but they rarely provide the context or proof needed to make confident decisions.

At Chaleit, we approach AI security through a secure-by-design lens, focusing on how systems are built, how components interact, and how they behave when AI introduces autonomous decision-making across those components and whether security assumptions hold up under attack. To support this approach, we developed a context-rich AI security capability that combines local LLM analysis, automated exploitation, and hands-on security engineering.

To show what this looks like in practice, we applied this method to DVWA (Damn Vulnerable Web Application), a widely used security benchmarking environment.

The objective was straightforward: determine whether context-aware analysis and exploitation could replace weeks of manual triage with faster, defensible results.

AI security challenges

Most security programs start with scanning tools that generate volume rather than clarity.

“In a field dominated by AI-based threats, we need to focus on what actually matters, what has context within the organisation and leads to real, material improvement,” explains Dan Haagman, CEO of Chaleit.

In this benchmark, traditional SAST scanners flagged more than 2,000 potential issues across 319 files of PHP, JavaScript, and HTML. In a real organisation, this typically leads to:

3–4 weeks of manual triage
additional time verifying exploitability
the majority of findings proving non-actionable

Security teams spend more time managing findings than reducing risk. As AI is introduced, these gaps are amplified, accelerating how quickly weak assumptions are exploited across systems.

Chaleit approach

Chaleit uses a context-aware AI security analysis and exploitation capability designed to identify, validate, and prioritise real risk in complex systems to restructure the workflow around proof rather than assumptions. Here’s what this means in practice:

Context-first analysis

The full codebase was analysed using private, local LLMs with architectural context: understanding data flows, component interactions, business logic, and how decisions are chained, delegated, and executed, particularly where AI components influence outcomes. This reduced 2,000 scanner findings to 249 contextually relevant issues.

Automated exploitation

Our team then attempted real exploitation, specifically targeting the assumptions these systems rely on to behave safely under real-world conditions. Of 55 critical/high-severity issues, 48 were successfully exploited, each with working proof-of-concept code and visual evidence.

Expert validation

Chaleit security engineers reviewed and confirmed the exploits, validated the business impact, and delivered prioritised remediation guidance aligned with material risk.

Results

The outcome was a fundamentally clearer AI security picture:

98% noise reduction — From 2,000 potential issues to 48 confirmed exploits.
0% false positives on critical findings — Every reported critical issue was demonstrably exploitable.
87% exploitation success rate — 48 of 55 critical/high findings exploited automatically.
100% OWASP coverage — All 19 OWASP vulnerability categories identified and tested.
60% time savings — Validation cycles reduced from 5–6 weeks to 2–3 weeks

As Dan Haagman summarises:

“We shouldn’t be creating volumes of findings that sit in isolation. They need context, and they need to lead to real, material security uplift; otherwise, they're not actionable.”

What this means for security teams

These results translate directly into how security work gets done day-to-day:

Clearer priorities. Teams focus on a short list of proven risks instead of thousands of theoretical findings.
Faster delivery. Security validation no longer stalls releases for weeks. Decisions are made with evidence, not assumptions.
Higher confidence. Every critical issue includes working exploits and visual proof — removing “is this real?” debates.
Better return on effort. Security investment goes into fixing what actually matters, not managing noise.

In our experience, most organisations are not struggling because they lack tools. They struggle because they are building on insecure foundations, and their security work produces volume without clarity.

In practice, this means validating whether systems operate within their intended boundaries, even as AI is layered on top and begins to influence decisions and interactions.

Chaleit’s approach is designed to focus on understanding context, validating risk, and helping teams act on what matters. The result is stronger decisions and security improvements that hold up in the real world.