SOTAVerified

Benchmarking

Papers

Showing 226250 of 5548 papers

TitleStatusHype
Learning Transferable Visual Models From Natural Language SupervisionCode2
Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual EditingCode2
EQ-Bench: An Emotional Intelligence Benchmark for Large Language ModelsCode2
EvalGIM: A Library for Evaluating Generative Image ModelsCode2
Advances in APPFL: A Comprehensive and Extensible Federated Learning FrameworkCode2
FairMedFM: Fairness Benchmarking for Medical Imaging Foundation ModelsCode2
LoRA-XS: Low-Rank Adaptation with Extremely Small Number of ParametersCode2
A Call to Reflect on Evaluation Practices for Age Estimation: Comparative Analysis of the State-of-the-Art and a Unified BenchmarkCode2
LtU-ILI: An All-in-One Framework for Implicit Inference in Astrophysics and CosmologyCode2
AutoTrust: Benchmarking Trustworthiness in Large Vision Language Models for Autonomous DrivingCode2
AutoPenBench: Benchmarking Generative Agents for Penetration TestingCode2
EffiBench: Benchmarking the Efficiency of Automatically Generated CodeCode2
EasyTPP: Towards Open Benchmarking Temporal Point ProcessesCode2
LLM-Based Multi-Agent Systems are Scalable Graph Generative ModelsCode2
Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMsCode2
DreamBench++: A Human-Aligned Benchmark for Personalized Image GenerationCode2
State-specific protein-ligand complex structure prediction with a multi-scale deep generative modelCode2
Fast Vision Transformers with HiLo AttentionCode2
Desbordante: from benchmarking suite to high-performance science-intensive data profiler (preprint)Code2
A Content-Driven Micro-Video Recommendation Dataset at ScaleCode2
MMLongBench-Doc: Benchmarking Long-context Document Understanding with VisualizationsCode2
Deep Visual Geo-localization BenchmarkCode2
A large annotated medical image dataset for the development and evaluation of segmentation algorithmsCode2
Datasets and Benchmarks for Offline Safe Reinforcement LearningCode2
Decouple and Track: Benchmarking and Improving Video Diffusion Transformers for Motion TransferCode2
Show:102550
← PrevPage 10 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified