SOTAVerified

Benchmarking

Papers

Showing 10111020 of 5548 papers

TitleStatusHype
FaceBench: A Multi-View Multi-Level Facial Attribute VQA Dataset for Benchmarking Face Perception MLLMsCode1
Don't be Contradicted with Anything! CI-ToD: Towards Benchmarking Consistency for Task-oriented Dialogue SystemCode1
Application-Oriented Benchmarking of Quantum Generative Learning Using QUARKCode1
Benchmarking Geospatial Question Answering Engines using the Dataset GeoQuestions1089Code1
Fantastic Questions and Where to Find Them: FairytaleQA -- An Authentic Dataset for Narrative ComprehensionCode1
FedAIoT: A Federated Learning Benchmark for Artificial Intelligence of ThingsCode1
Down with the Hierarchy: The 'H' in HNSW Stands for "Hubs"Code1
Do We Need Another Explainable AI Method? Toward Unifying Post-hoc XAI Evaluation Methods into an Interactive and Multi-dimensional BenchmarkCode1
Benchmarking Object Detectors under Real-World Distribution Shifts in Satellite ImageryCode1
AI Agents That MatterCode1
Show:102550
← PrevPage 102 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified