How We Test AI: LLM & GenAI Security Methodology at Anvil Secure - Anvil Secure
By Anvil SecureOn May 27, 2026May 27, 20260 Comments
By George Damiris
Overview
The company's methodology for testing LLM and GenAI services is based on industry best practices as well as hands-on experience testing AI agents and models across multiple platforms and providers, combined with published industry frameworks: the OWASP Top 10 for Large Language Models and the OWASP LLM Security Verification Standard. Testing is performed through a combination of manual adversarial testing, semi-automated tooling, and proprietary research tooling developed internally.
The scope of assessment focuses on the customer's deployed model configuration and custom software stack — not the underlying cloud infrastructure managed by the provider. Within the shared responsibility model, this covers the application layer, agent architecture, integrations, and AI-specific attack surfaces.
The company's methodology is focused on the following high-level areas:
Data : Protecting sensitive data used for training and inference, ensuring it's anonymized and compliant with regulations like GDPR.
Model : Securing the specific AI model in use, protecting it from attacks like adversarial inputs or model poisoning.
Access : Managing access controls to ensure only authorized users and applications can access the AI model and its data. Additionally, ensuring access controls are implemented to prevent the AI model from being exploited to bypass the intended authorization.
Applications : Securing the applications and agents built on top of the AI platform, as well as how they are configured and used.
Approach and Scope Definition
As each system has unique capabilities, integrations and risk exposure, our approach is never generic or checklist driven. Each project is tailored based on its architecture, trust boundaries and business risks of your deployment.
Rather than testing the model in isolation, we evaluate the entire AI execution chain: inputs, logic, permissions, and system impact. This ensures we discover systemic weaknesses, not just surface-level vulnerabilities.
Our objective is simple: identify how your AI could be manipulated, measure the real business impact, and provide clear, architecture-aligned remediation guidance that strengthens both security and operational resilience.
Scope is never assumed. Prior to testing, the company performs a structured threat modeling tailored to the target system. This produces the test case inventory used throughout the engagement. We map your AI ecosystem end-to-end:
Identify business purposes and Operational Context: We define the intended business function of the system, the level of autonomy granted to the agent, user roles interacting with the system, and the critical workflows it supports.
Identify the Model in Use<br>We document the model provider, model family, and version used by the service. Model capabilities, context limits, safety mechanisms, and update cadence influence the system's security posture.
Map Architecture and Components<br>We document the system architecture including orchestration layers, RAG pipelines, memory stores, vector databases, tool integrations, APIs, plugins, MCP integrations, external services, and human-in-the-loop oversight points.
Identify MCP Integrations and Capabilities<br>We identify connected MCP servers and the tools or services they expose to the agent. We document what operations these tools allow and what systems they interact with.
Identify Trust Boundaries<br>We determine where data or control crosses security domains, such as user input channels, external content sources, MCP servers, inter-agent communication, and interactions with internal systems.
Classify Assets and Sensitive Data<br>We identify sensitive assets accessible to the system, including credentials, API tokens, system prompts, proprietary knowledge bases, personal data, and other regulated information. We evaluate the potential operational, financial, or reputational impact if vulnerabilities were exploited.
Analyze Agent Capabilities and Permissions<br>We evaluate what the agent can read, write, execute, or trigger through internal tools, MCP services, APIs, and integrated platforms.
Enumerate AI-Specific Attack Vectors<br>We identify potential attack paths specific to LLM and agentic architectures, including prompt injection, jailbreak attempts, malicious MCP tool usage, RAG poisoning, memory manipulation, autonomous goal hijacking, tool misuse, and context leakage.
Test Case Development
Based on the threats we identified, we develop targeted test scenarios to evaluate whether the system can be manipulated or exploited in practice. To design these scenarios, we analyze:
Accepted Input Channels<br>We identify all input vectors accepted by the system, including user prompts, documents, web content, APIs, structured inputs, and other external data sources.
Interactions with other Systems<br>We...