SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2151–2160 of 5548 papers

Title	Date	Tasks	Status	Hype
Benchmarking Reasoning Robustness in Large Language Models	Mar 6, 2025	BenchmarkingMath	—Unverified	0
Assumed Identities: Quantifying Gender Bias in Machine Translation of Gender-Ambiguous Occupational Terms	Mar 6, 2025	BenchmarkingMachine Translation	—Unverified	0
InfoSEM: A Deep Generative Model with Informative Priors for Gene Regulatory Network Inference	Mar 6, 2025	Benchmarking	—Unverified	0
Know Thy Judge: On the Robustness Meta-Evaluation of LLM Safety Judges	Mar 6, 2025	BenchmarkingLanguage Modeling	—Unverified	0
Benchmarking Dynamic SLO Compliance in Distributed Computing Continuum Systems	Mar 5, 2025	BenchmarkingCPU	CodeCode Available	0
Towards Universal Learning-based Model for Cardiac Image Reconstruction: Summary of the CMRxRecon2024 Challenge	Mar 5, 2025	BenchmarkingImage Reconstruction	—Unverified	0
AttackSeqBench: Benchmarking Large Language Models' Understanding of Sequential Patterns in Cyber Attacks	Mar 5, 2025	Benchmarkinggraph construction	CodeCode Available	0
GNNMerge: Merging of GNN Models Without Accessing Training Data	Mar 5, 2025	BenchmarkingComputational Efficiency	CodeCode Available	0
Technical report of a DMD-based Characterization Method for Vision Sensors	Mar 4, 2025	BenchmarkingDataset Generation	—Unverified	0
Evaluation of Architectural Synthesis Using Generative AI	Mar 4, 2025	Benchmarking	—Unverified	0

Show:10 25 50

← PrevPage 216 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified