SOTAVerified|Agents Browse Leaderboard About

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 71–80 of 5548 papers

Title	Date	Tasks	Status	Hype
Prithvi-EO-2.0: A Versatile Multi-Temporal Foundation Model for Earth Observation Applications	Dec 3, 2024	BenchmarkingDisaster Response	CodeCode Available	3
Caravan MultiMet: Extending Caravan with Multiple Weather Nowcasts and Forecasts	Nov 14, 2024	Benchmarking	CodeCode Available	3
General Geospatial Inference with a Population Dynamics Foundation Model	Nov 11, 2024	BenchmarkingGraph Neural Network	CodeCode Available	3
Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent	Nov 5, 2024	BenchmarkingHallucination	CodeCode Available	3
XRDSLAM: A Flexible and Modular Framework for Deep Learning based SLAM	Oct 31, 2024	3DGSBenchmarking	CodeCode Available	3
AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents	Oct 31, 2024	Benchmarking	CodeCode Available	3
OGBench: Benchmarking Offline Goal-Conditioned RL	Oct 26, 2024	Benchmarkingreinforcement-learning	CodeCode Available	3
Robust Watermarking Using Generative Priors Against Image Editing: From Benchmarking to Advances	Oct 24, 2024	BenchmarkingImage to Video Generation	CodeCode Available	3
VoiceBench: Benchmarking LLM-Based Voice Assistants	Oct 22, 2024	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	CodeCode Available	3
LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory	Oct 14, 2024	BenchmarkingLarge Language Model	CodeCode Available	3

Show:10 25 50

← PrevPage 8 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified