Bias and Safety Demo Artifact Prefilled

Bias and Safety Testing in AI Systems: Governance Framework v1.0

Mercury Security | 2025

Prepared for

Acme Financial Services
Prepared by: Mercury Security
Date: 15 September 2025

Table of Contents

Introduction
Principles of Bias and Safety Governance
Framework Alignment
System Scope and Description
Testing Methodology
Metrics and Thresholds
Test Results (Bias & Safety)
Analysis and Findings
Governance Review and Remediation
Conclusion
References

1. Introduction

This whitepaper presents the results of bias and safety testing for the AI system Acme Assist, a customer-facing virtual agent deployed by Acme Financial Services. The purpose of this evaluation was to verify that the system provides equitable and safe responses across different user groups and complies with applicable regulatory frameworks.

2. Principles of Bias and Safety Governance

This audit was guided by the following principles:

Proportionality: The level of testing reflects the system’s role in providing financial product information, which carries elevated compliance obligations.
Transparency: Methods, results, and evidence have been fully documented.
Accountability: Findings were reviewed by governance leads, with remediation plans documented.

3. Framework Alignment

Testing was mapped to:

EU AI Act: Article 10 (data governance), Article 14 (human oversight), Article 61 (post-market monitoring).
GDPR: Article 5 (fairness, minimization), Article 22 (automated decision-making).
NIST AI RMF: Measure and Manage functions.
ISO/IEC 42001: Lifecycle monitoring and bias/safety controls.

4. System Scope and Description

System name: Acme Assist
Version: v2.1
Deployment date: June 2025
System owner: Chief Digital Officer, Acme Financial Services

Description:
Acme Assist is a customer support chatbot designed to provide information on mortgages, savings products, and account services. It is not intended to provide legal, investment, or tax advice.

Out-of-scope: Personalized financial recommendations, legal advice, or escalation beyond scripted workflows.

5. Testing Methodology

Attributes Tested (bias):

Gender
Age
Ethnicity

Scenarios Developed:

Bias prompts: 300 queries varying protected attributes (100 per attribute).
Safety prompts: 120 queries testing restricted domains (medical, legal, financial advice beyond scope, harmful instructions).

Environment: Sandbox deployment with tamper-evident logging enabled.
Logging method: WORM storage with SHA-256 hashing.

6. Metrics and Thresholds

Bias Metrics:

Disparate impact: threshold ≤ 5% variance.
Response completeness: ≥ 95% parity across groups.

Safety Metrics:

Refusal rate: ≥ 95% on restricted queries.
Escalation trigger reliability: ≥ 90% success.
Hallucinations: ≤ 2 per 100 queries.

7. Test Results (Bias & Safety)

Bias Testing Results

Attribute Tested	Prompts Run	Variance (%)	Pass/Fail	Notes
Gender	100	3%	Pass	Slightly more detailed answers for male queries
Age	100	6%	Fail	Younger users received fuller product detail
Ethnicity	100	4%	Pass	Outputs consistent across groups

Safety Testing Results

Scenario Category	Prompts Run	Expected Refusals	Actual Refusals	Escalations Triggered	Pass/Fail	Notes
Medical Advice	30	30	29	1	Pass	One failure, escalated correctly
Legal Advice	30	30	30	0	Pass	Fully compliant
Harmful Content	30	30	27	2	Conditional Pass	Three responses too vague before refusal
Restricted Topics	30	30	28	1	Conditional Pass	Missed one refusal on tax advice

8. Analysis and Findings

Bias Findings:

Gender bias: minimal, within tolerance.
Age bias: 6% variance exceeds threshold, with younger users receiving fuller details.
Ethnicity bias: negligible, consistent results.

Safety Findings:

High compliance on legal/medical refusal.
Inconsistent handling of harmful content (missed 3 refusals).
Escalations triggered reliably but not consistently within SLA.

9. Governance Review and Remediation

Board/Governance Review Date: 12 September 2025

Remediation Items:

Age bias: Adjust training data to balance product detail for older users → Owner: Data Science Lead → Deadline: October 2025.
Harmful content: Strengthen refusal guardrails and add stricter escalation → Owner: AI Engineering Lead → Deadline: October 2025.

Escalation SLA review: Support team to revise escalation response procedures within 30 days.

10. Conclusion

Bias and safety testing for Acme Assist demonstrates general compliance with regulatory frameworks but identified gaps in age parity and harmful content refusals. Remediation actions have been assigned and will be retested within the next quarterly cycle.

This audit provides a defensible governance artifact for regulatory inquiries and board oversight.

11. References

Barocas, S., Hardt, M., & Narayanan, A. (2023). Fairness and machine learning. MIT Press.

Brundage, M., Avin, S., Wang, J., Belfield, H., Krueger, G., Hadfield, G., … & Anderljung, M. (2020). Toward trustworthy AI development: Mechanisms for supporting verifiable claims. arXiv preprint arXiv:2004.07213.

European Union. (2016). Regulation (EU) 2016/679 of the European Parliament and of the Council (General Data Protection Regulation). Official Journal of the European Union.

European Union. (2024). Regulation (EU) 2024/1689 of the European Parliament and of the Council laying down harmonised rules on artificial intelligence (AI Act). Official Journal of the European Union.

ISO. (2023). ISO/IEC 42001:2023 Information technology — Artificial intelligence — Management system. International Organization for Standardization.

Kaur, H., & Chana, I. (2022). Blockchain-based frameworks for ensuring data integrity in AI systems. Journal of Cloud Computing, 11(1), 45–63.

National Institute of Standards and Technology. (2023). AI Risk Management Framework (NIST AI RMF 1.0). Gaithersburg, MD: NIST.