AI Agents & Systems — Audit Criteria v1.0

Mercury Security Whitepaper | 2025

Introduction

Artificial Intelligence (AI) agents are increasingly used in enterprise environments for customer service, internal knowledge retrieval, workflow automation, and even social media management. As adoption accelerates, so do concerns about compliance, governance, and security. Poorly controlled AI deployments can lead to reputational damage, regulatory penalties, or even systemic business disruption. To avoid these outcomes, organizations need clear and actionable audit criteria that can be applied across AI agents and adjacent systems.

The goal of this document is to provide a practical framework for auditing AI agents, written in accessible terms for teams at all levels of technical and governance expertise. The focus is on “how to” implement controls, gather evidence, and demonstrate readiness to internal and external stakeholders. This whitepaper grounds recommendations in widely recognized governance frameworks including the European Union Artificial Intelligence Act (AI Act), the National Institute of Standards and Technology’s AI Risk Management Framework (NIST AI RMF), the General Data Protection Regulation (GDPR), and ISO/IEC 42001 on AI management systems (European Union, 2024; NIST, 2023; European Union, 2016; ISO, 2023).

Principles for AI Agent Auditing

AI agent auditing begins with principles that provide the rationale for every control. Transparency means that users must be able to see when AI is used, and ideally understand why a system provided a given response. If a source is cited, the citation must be real, verifiable, and unaltered. If no source exists, the system should explicitly state so rather than hallucinating information (Brundage et al., 2020). Accountability means that human oversight must be available and clearly logged so that any system decision can be traced to an escalation path. Proportionality ensures that the strictness of controls matches the impact of the system; for example, a healthcare AI requires stronger safeguards than a basic internal chatbot (European Union, 2024).

Data minimization, derived from GDPR Article 5, requires that only necessary data be processed and retained, with a clear deletion schedule (European Union, 2016). Finally, change control ensures that system updates are logged, versioned, and reversible. This prevents silent degradation or compliance regressions during rapid iterations (ISO, 2023).

Governance Controls in Practice

Auditing is not about checkboxes but about verifying that meaningful controls exist and function. Purpose declarations should be documented in plain language, including the explicit scope of the AI system and examples of what falls outside its intended use. Consent and notices must be visible and meaningful to the end user, avoiding manipulative design practices or “dark patterns” (NIST, 2023).

Transparency requires not only the display of sources but also reliable refusal patterns. For example, an AI receptionist should safely decline financial advice, while escalating to a human agent when encountering high risk or unfamiliar queries. Escalation must be tested regularly in sandbox environments to ensure the handoff works and logs capture both the trigger and the outcome.

Identity and access management should align with least privilege principles. This means integrating AI systems with enterprise identity providers, enforcing Single Sign-On (SSO), and restricting connectors or APIs to the minimal scope needed. Logs must be tamper-evident and protected against deletion. The best practice is to use write-once, read-many (WORM) storage or blockchain-based immutability checks for integrity (Kaur & Chana, 2022).

Retention policies must be formally documented, specifying the maximum time logs and conversations are held. These policies should be tested with deletion requests to ensure actual data removal, supporting compliance with GDPR’s right to erasure (European Union, 2016). Change management controls include versioning of models and configuration, documented approval workflows, and rollback capabilities when problems arise. Monitoring should not be a one-time exercise but a recurring process of sampling, drift analysis, and refusal pattern testing.

Evidence Gathering

A credible audit is impossible without evidence. An effective audit pack should include system artifacts such as the documented purpose, user notices, and consent text. Configuration details, including allow/deny lists and redaction rules, should be exported and signed off. Access records should demonstrate that roles are defined and that scopes are enforced.

Logs are a central component of evidence. They must include prompts, sources, responses, timestamps, and user roles. However, logs must also be redacted to remove personally identifiable information (PII) while maintaining integrity for compliance review. Testing outputs, such as results of bias and safety scenarios, refusal consistency rates, and error handling, should also be included. Change logs and rollback demonstrations prove lifecycle governance, while deletion confirmations prove compliance with retention policies.

Hosting assurance must also be documented. This includes identifying the provider, the region where data is stored, and whether a Data Processing Agreement (DPA) is in place. Many regulatory regimes now require proof of regional hosting for sensitive data, particularly in the EU (European Union, 2024).

Acceptance Testing

Testing criteria ensure that audit claims can be validated in practice. Transparency can be evaluated by sampling responses across different queries and confirming that citations are accurate. Guardrails can be tested by injecting sensitive or restricted prompts and observing whether the system declines or escalates appropriately. Human-in-the-loop functionality should be validated by forcing escalation conditions and confirming that logs and timestamps align with expectations.

Access controls can be validated by attempting to retrieve restricted knowledge under an unauthorized role. Logging integrity should be tested by verifying hash values or immutability records. Retention policies should be tested by requesting export and deletion of conversation data, verifying that deletion occurs within the documented timeframe. Monitoring cadence can be confirmed by verifying the presence of weekly or monthly sampling reports that have been reviewed and approved by designated personnel (NIST, 2023).

Roles and Responsibilities

Clear responsibility assignments are critical. Product or data owners are accountable for purpose declarations, connector configurations, and lifecycle approvals. Compliance and governance teams must review and align policies with applicable frameworks. Security and IT teams are responsible for access, identity, and log integrity. Support or operations teams should manage escalation queues and service-level agreements. Finally, independent auditors such as Mercury Security provide validation, evidence pack review, and board-level reporting (ISO, 2023).

Framework Alignment

The EU AI Act emphasizes transparency, logging, and post-market monitoring as central requirements (European Union, 2024). The NIST AI RMF structures audit activity into governance, mapping, measurement, and management functions (NIST, 2023). GDPR requires lawful basis for processing, minimization, deletion, and safeguards around automated decision-making under Article 22 (European Union, 2016). ISO/IEC 42001 provides a management system structure for AI, focusing on policy, controls, measurement, and continuous improvement (ISO, 2023).

Aligning AI agent auditing with these frameworks ensures credibility and defensibility.

Hosting and Assurance

Organizations often rely on third-party hosting providers. To maintain compliance, these providers must be pre-vetted, with DPAs in place, and options for regional hosting available. Documentation should demonstrate that data is transmitted over encrypted channels and stored in compliance with applicable jurisdictional requirements. Customers should be able to request provider and region details under NDA, ensuring transparency without exposing sensitive operational infrastructure (Brundage et al., 2020).

Audit Outcomes

Audit results should be expressed in three categories. A “pass with notes” means minor remediations are needed but the system may proceed. A “conditional pass” means critical gaps exist and must be remediated before production. A “fail” means fundamental governance controls are absent, and the system is not fit for deployment. This classification provides boards and regulators with clarity while giving product teams actionable direction.

Maintenance

Auditing is not a one-time exercise but an ongoing process. Quarterly reviews should examine configurations, logs, and change histories. Monthly sampling should test refusal and consistency. Any material system update should trigger a focused re-audit of transparency, guardrails, redaction, logging, and deletion functions. Continuous maintenance ensures the audit remains current and credible over time (ISO, 2023).

Conclusion

AI agents present significant opportunities but also governance risks. Without audit criteria, organizations risk deploying systems that are opaque, unaccountable, and non-compliant. By following the principles, controls, and evidence methods outlined here, organizations can ensure their AI systems are transparent, accountable, and auditable. This framework provides a defensible foundation that aligns with regulatory requirements and instills trust among users, stakeholders, and regulators.

References

Brundage, M., Avin, S., Wang, J., Belfield, H., Krueger, G., Hadfield, G., … & Anderljung, M. (2020). Toward trustworthy AI development: Mechanisms for supporting verifiable claims. arXiv preprint arXiv:2004.07213.

European Union. (2016). Regulation (EU) 2016/679 of the European Parliament and of the Council (General Data Protection Regulation). Official Journal of the European Union. Retrieved from https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=celex%3A32016R0679

European Union. (2024). Regulation (EU) 2024/1689 of the European Parliament and of the Council laying down harmonised rules on artificial intelligence (AI Act). Official Journal of the European Union. Retrieved from https://eur-lex.europa.eu

ISO. (2023). ISO/IEC 42001:2023 Information technology — Artificial intelligence — Management system. International Organization for Standardization.

Kaur, H., & Chana, I. (2022). Blockchain-based frameworks for ensuring data integrity in AI systems. Journal of Cloud Computing, 11(1), 45–63. https://doi.org/10.1186/s13677-022-00301-9

National Institute of Standards and Technology. (2023). AI Risk Management Framework (NIST AI RMF 1.0). Gaithersburg, MD: NIST.

linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram