In conversations about AI governance, the focus often turns to tools. Vendors promise dashboards that scan models, platforms that automate bias testing, or templates that produce neat compliance reports. Tools matter. Yet recent research suggests that current audit tooling leaves critical gaps unaddressed, particularly when audits move from theory into practice.

Ojewale, Steed, and Raji (2024) conducted one of the most comprehensive studies of this space, reviewing nearly four hundred tools and interviewing practitioners. Their conclusion was stark: most tools do not support the full range of needs auditors face. In particular, few help discover real-world harms, involve affected stakeholders, or monitor incidents continuously. Tools are designed for narrow tasks, not for the complexity of socio-technical systems.

This disconnect is not abstract. When the Financial Reporting Council (2023) reviewed the use of AI in financial audits, it found firms were deploying AI-enabled systems without clear evaluation of how those systems influenced audit quality. Tools were providing efficiency but not necessarily assurance. Similar themes emerge in industry surveys. PwC’s 2024 Responsible AI Survey found that while many companies are experimenting with AI risk management software, fewer than half had confidence that these tools captured regulatory expectations in a defensible way (PwC, 2024).

For those of us working in governance, this should resonate. Many of us have trialed tools that promised fairness analysis but ignored drift monitoring, or systems that captured model metadata without providing evidence formats regulators recognize. These partial solutions risk creating false confidence.

The challenge is not simply choosing better tools but rethinking what an AI audit infrastructure should provide. One layer involves automated metrics. Systems that capture bias scores, accuracy drift, or data lineage are necessary. But they cannot be sufficient. A second layer requires participatory mechanisms. Audits must involve perspectives from stakeholders who experience the consequences of AI systems, especially marginalized groups. Without this, audits risk focusing on technical neatness rather than lived impact.

A third layer is continuous oversight. Static reports do not capture the dynamic risks of AI. Incidents emerge in production, long after testing has concluded. Audit infrastructure must therefore integrate monitoring and escalation, ensuring that evidence of harms is recorded and linked to accountability chains. Research from Barth et al. (2024) on ethics-based auditing within AstraZeneca reinforces this need. Their year-long study found that governance only became credible when incident processes and communication pathways were established across the organization. Tools alone could not achieve this; they required embedding into structures and culture.

Mitigating these gaps requires deliberate design. Organizations should begin by demanding transparency from vendors. If a tool produces bias metrics, what assumptions underlie them? Which regulatory frameworks are they mapped to? Next, audit leaders should require independent validation of tools. Just as we would not rely on financial controls without testing, AI audit tooling should itself be subject to assurance. Finally, organizations should treat tools as components of a broader governance system rather than as stand-alone fixes. Evidence generation, stakeholder involvement, and incident monitoring must all connect if audits are to satisfy regulators and protect citizens.

The open question for us as peers is how to build this infrastructure in practice. Which combinations of automated metrics, participatory review, and incident escalation have you seen work? Are there examples where audit teams successfully embedded tools into enterprise risk systems without creating silos?

Our field advances when we exchange not only tools but also lessons about how they succeed or fail in the real world. Tools will remain central to AI governance, but without layered approaches, they risk giving us polished dashboards and little substance. The task ahead is to turn tooling into infrastructure that supports trust, accountability, and resilience.


References

Barth, S., Wanner, D., Zimmermann, H., & Andreeva, J. (2024). Operationalising AI governance through ethics-based auditing: An industry case study. arXiv. https://arxiv.org/abs/2407.06232

Financial Reporting Council. (2023). The use of technology in the audit of financial statements. FRC. https://www.frc.org.uk

Ojewale, A., Steed, C., & Raji, I. D. (2024). Towards AI accountability infrastructure: Gaps and opportunities in AI audit tooling. arXiv. https://arxiv.org/abs/2402.17861

PwC. (2024). Responsible AI survey: How US companies are managing risk in an era of generative AI. PwC. https://www.pwc.com/us/en/tech-effect/ai-analytics/responsible-ai-survey.html

linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram