MACEval: A Multi-Agent Continual Evaluation Network for Large Models

2026-01-30Code Available0· sign in to hype

Zijian Chen, Yuze Sun, Yuan Tian, Wenjun Zhang, Guangtao Zhai

Code Available — Be the first to reproduce this paper.

Code

github.com/zijianchen98/maceval
OfficialIn paper★ 6

Abstract

Hundreds of benchmarks dedicated to evaluating large models have been presented over the past few years. However, most of them remain closed-ended and are prone to overfitting due to the potential data contamination. Moreover, the increasing scale and scope of current benchmarks with transient metrics, as well as the heavily human-dependent curation procedure, pose significant challenges for timely maintenance and adaptation. In this paper, we introduce MACEval, a Multi-Agent Continual Evaluation network for dynamic evaluation of large models, and define new metrics to quantify performance longitudinally. MACEval employs an interactive and autonomous evaluation mode, utilizing role assignment, in-process data generation, and evaluation routing through a cascaded agent network. Extensive experiments on 23 large models demonstrate the effectiveness of MACEval, which also lightens the evaluation process and reduces a considerable amount of overhead. We hope that MACEval can broaden future directions of large model evaluation. Project page: https://github.com/zijianchen98/MACEval.

MACEval: A Multi-Agent Continual Evaluation Network for Large Models

Code

Abstract

Reproductions