Artificial intelligence is increasingly used in software testing for test generation, regression prioritization, repair, and defect analysis. Most studies, however, optimize for local gains such as speed, coverage, or prompt quality. From a software quality perspective, that leaves an important gap. Regulated platforms in banking, healthcare, insurance, and public services require testing workflows that improve quality while also preserving traceability, evidence retention, human oversight, and defensible release decisions. This paper develops TRACE-TEST, an evidence-linked framework for software quality assurance in AI-augmented testing. The paper combines a targeted integrative review of recent empirical testing studies and current governance sources with a design-science artefact proposal. The review shows two simultaneous realities. The technical potential of LLM-based testing is rising quickly. Industrial adoption, standardized evidence practices, and post-deployment feedback loops remain immature. TRACE-TEST addresses that gap by linking regulations and controls, requirements, risk classification, AI-generated artefacts, reviewer actions, execution evidence, monitoring signals, and release sign-off into a single quality-assurance chain. The paper makes four contributions. First, it identifies five research gaps that matter most for high-assurance software quality. Second, it synthesizes recent empirical evidence on coverage, mutation effectiveness, traceability, practitioner oversight, and software evolution. Third, it specifies a minimum evidence object and a governed release-assurance workflow. Fourth, it proposes an evaluation model that combines technical testing metrics with quality-assurance metrics such as traceability completeness and evidence readiness. The main contribution is not simply more AI in testing. It is a framework that makes AI-assisted testing more reviewable, measurable, and software-quality aligned in regulated environments.
AI-assisted testing is no longer a speculative topic. The empirical literature shows both strong technical promise and important limits. Regulated platforms need a response that goes beyond faster generation. They need workflows that remain reviewable, traceable, measurable, and defensible over time.
TRACE-TEST is offered as that response. By connecting controls, requirements, risk classification, AI-generated artefacts, reviewer actions, execution evidence, monitoring signals, and release sign-off, the framework turns AI-augmented testing into an evidence-linked software quality assurance process. The paper’s central argument is therefore straightforward. In regulated software, the value of AI-augmented testing should be judged not only by what it automates, but by how well it supports high-quality and audit-ready release assurance.
References
[1] J. Wang, Y. Huang, C. Chen, Z. Liu, S. Wang, and Q. Wang, “Software Testing With Large Language Models: Survey, Landscape, and Vision,” IEEE Transactions on Software Engineering, 2024; survey covering 102 studies.
[2] K. Karhu, J. Kasurinen, and K. Smolander, “Expectations vs Reality – A Secondary Study on AI Adoption in Software Testing,” arXiv:2504.04921, 2025.
[3] K. Karhu, J. Kasurinen, and K. Smolander, “Barriers and Enablers of AI Adoption in Software Testing,” ICSEA secondary study, 2025.
[4] NIST, “Artificial Intelligence Risk Management Framework (AI RMF 1.0),” NIST AI 100-1, 2023.
[5] ISO, “ISO/IEC 42001:2023 – Artificial intelligence management system,” official overview, 2023.
[6] EU AI Act Service Desk, “Article 12: Record-keeping,” 2025.
[7] Regulation (EU) 2024/1689, Artificial Intelligence Act, official framework overview and record-keeping requirements, 2024.
[8] EU AI Act Service Desk / explanatory materials, “Article 14: Human oversight,” 2025.
[9] A. Rao et al., “Challenges to the Monitoring of Deployed AI Systems,” NIST AI 800-4, 2026.
[10] M. Schäfer, S. Nadi, A. Eghbali, and F. Tip, “An Empirical Evaluation of Using Large Language Models for Automated Unit Test Generation,” IEEE Transactions on Software Engineering, 2023.
[11] A. M. Dakhel et al., “Effective Test Generation Using Pre-trained Large Language Models and Mutation Testing,” arXiv:2308.16557, 2023.
[12] D. Huang, J. M. Zhang, M. Harman, Q. Zhang, M. Du, and S.-K. Ng, “Benchmarking LLMs for Unit Test Generation from Real-World Functions: ULT,” arXiv:2508.00408, 2025.
[13] S. Haroon, M. T. Khan, and M. A. Gulzar, “Evaluating LLM-Based Test Generation Under Software Evolution,” arXiv:2603.23443, 2026.
[14] E. Alor, S. Khatoonabadi, and E. Shihab, “Evaluating the Use of LLMs for Documentation to Code Traceability,” arXiv:2506.16440, 2025.
[15] M. D. Santana, C. Magalhaes, and R. de Souza Santos, “Software Testing with Large Language Models: An Interview Study with Practitioners,” arXiv:2510.17164, 2025.
[16] A. Vogelsang, A. Korn, G. Broccia, A. Ferrari, J. Fischbach, and C. Arora, “On the Impact of Requirements Smells in Prompts: The Case of Automated Traceability,” arXiv:2501.04810, 2025.
[17] S. Baltes et al., “Evaluation Guidelines for Empirical Studies in Software Engineering involving LLMs,” arXiv:2508.15503, 2025.
[18] Z. Zhou, “Risk-based test framework for LLM features in regulated software,” arXiv:2601.17292, 2026.
How to Cite This Paper
Rajeew Vishvakarma (2026). TRACE-TEST: An Evidence-Linked Software Quality Assurance Framework for AI-Augmented Testing in Regulated Platforms. International Journal of Computer Techniques, 13(2). ISSN: 2394-2231.