SOTAVerified

Benchmarking

Papers

Showing 7180 of 5548 papers

TitleStatusHype
Prithvi-EO-2.0: A Versatile Multi-Temporal Foundation Model for Earth Observation ApplicationsCode3
Caravan MultiMet: Extending Caravan with Multiple Weather Nowcasts and ForecastsCode3
General Geospatial Inference with a Population Dynamics Foundation ModelCode3
Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning AgentCode3
XRDSLAM: A Flexible and Modular Framework for Deep Learning based SLAMCode3
AndroidLab: Training and Systematic Benchmarking of Android Autonomous AgentsCode3
OGBench: Benchmarking Offline Goal-Conditioned RLCode3
Robust Watermarking Using Generative Priors Against Image Editing: From Benchmarking to AdvancesCode3
VoiceBench: Benchmarking LLM-Based Voice AssistantsCode3
LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive MemoryCode3
Show:102550
← PrevPage 8 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified