SOTAVerified|Agents Browse Leaderboard About

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 461–470 of 5548 papers

Title	Date	Tasks	Status	Hype
PyTDC: A multimodal machine learning training, evaluation, and inference platform for biomedical foundation models	May 8, 2025	BenchmarkingGraph Representation Learning	CodeCode Available	1
Benchmarking Vision, Language, & Action Models in Procedurally Generated, Open Ended Action Environments	May 8, 2025	BenchmarkingPrompt Engineering	CodeCode Available	1
Benchmarking LLMs' Swarm intelligence	May 7, 2025	Benchmarking	CodeCode Available	1
Benchmarking LLM Faithfulness in RAG with Evolving Leaderboards	May 7, 2025	BenchmarkingHallucination	CodeCode Available	1
RGB-Event Fusion with Self-Attention for Collision Prediction	May 7, 2025	BenchmarkingComputational Efficiency	CodeCode Available	1
CombiBench: Benchmarking LLM Capability for Combinatorial Mathematics	May 6, 2025	Benchmarking	CodeCode Available	1
RTV-Bench: Benchmarking MLLM Continuous Perception, Understanding and Reasoning through Real-Time Video	May 4, 2025	BenchmarkingQuestion Answering	CodeCode Available	1
GEOM-Drugs Revisited: Toward More Chemically Accurate Benchmarks for 3D Molecule Generation	Apr 30, 2025	3D Molecule GenerationBenchmarking	CodeCode Available	1
OSVBench: Benchmarking LLMs on Specification Generation Tasks for Operating System Verification	Apr 29, 2025	BenchmarkingCode Generation	CodeCode Available	1
TrueFake: A Real World Case Dataset of Last Generation Fake Images also Shared on Social Networks	Apr 29, 2025	BenchmarkingMisinformation	CodeCode Available	1

Show:10 25 50

← PrevPage 47 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified