TechImage-Bench: Rubric-Based Evaluation for Technical Image Generation
Minheng Ni, Zhengyuan Yang, Yaowen Zhang, Linjie Li, Chung-Ching Lin, Kevin Lin, Zhendong Wang, Xiaofei Wang, Shujie Liu, Lei Zhang, Wangmeng Zuo, Lijuan Wang
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/kodenii/proimage-benchOfficialIn paper★ 2
Abstract
We study technical image generation, where a model must synthesize information-dense, scientifically precise illustrations from detailed descriptions rather than merely produce visually plausible pictures. To quantify the progress, we introduce TechImage-Bench, a rubric-based benchmark that targets biology schematics, engineering/patent drawings, and general technical illustrations. For 654 figures collected from real textbooks and technical reports, we construct detailed image instructions and a hierarchy of rubrics that decompose correctness into 6,076 criteria and 44,131 binary checks. Rubrics are derived from surrounding text and reference figures using large multimodal models, and are evaluated by an automated LMM-based judge with a principled penalty scheme that aggregates sub-question outcomes into interpretable criterion scores. We benchmark several representative text-to-image models on TechImage-Bench and find that, despite strong open-domain performance, the best base model reaches only 0.801 rubric accuracy and 0.576 criterion score overall, revealing substantial gaps in fine-grained scientific fidelity. Finally, we show that the same rubrics provide actionable supervision: feeding failed checks back into an editing model for iterative refinement boosts a strong generator from 0.660 to 0.865 in rubric accuracy and from 0.382 to 0.697 in criterion score. TechImage-Bench thus offers both a rigorous diagnostic for technical image generation and a scalable signal for improving specification-faithful scientific illustrations.