Research on the Construction and Application of Automated Framework for Large-scale AI Server Testing Process

Authors

  • Ren Xingcheng Quanta Manufacturing Nashville LLC, TN 37086, USA Author

Keywords:

Automated Framework, Large-scale AI Server, Server Testing, Knowledge Graph, Reinforcement Learning

Abstract

Aiming at the challenges of large-scale AI server testing in efficiency, coverage and root cause location, this paper proposes a hierarchical dynamic automated testing framework. The framework is divided into resource-aware layer, intelligent execution layer and knowledge-driven layer, and communication is realized through unified API gateway and message queue. The resource-aware layer realizes the second-level construction and deployment of test environment based on reinforcement learning . Combining intelligent execution layer with combination testing and fuzzy testing, covering 99.99% of hardware failure modes; The knowledge driven layer constructs a Knowledge Graph for testing, which supports root cause automatic reasoning and optimization of testing strategies. The experiment is conducted on a heterogeneous cluster with 32 nodes . The results show that compared with traditional manual testing and basic automation scripts, the framework shortens the average environment construction time from 120 minutes and 25 minutes to 0.5 minutes, and the total test execution time is reduced from 72 hours and 48 hours to 15.5 hours, and the number of manual interventions is significantly reduced. Achieve 100% coverage of predefined faults and find 25 additional unknown faults; Mean time to root localization was shortened from several hours to minutes. The framework completely surpasses the traditional methods in efficiency, coverage and intelligence, and provides guarantee for the high-quality delivery and stable operation and maintenance of large-scale AI infrastructure.

 

Downloads

Published

2026-03-02

Issue

Section

Research Articles

How to Cite

Research on the Construction and Application of Automated Framework for Large-scale AI Server Testing Process. (2026). International Journal of Computer Science and Engineering, 1(02), 55-61. https://iakjournals.org/index.php/iakj/article/view/17