SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1941–1950 of 5548 papers

Title	Date	Tasks	Status	Hype
UniGen: A Unified Framework for Textual Dataset Generation Using Large Language Models	Jun 27, 2024	AttributeBenchmarking	CodeCode Available	2
Quantum-tunnelling deep neural network for optical illusion recognition	Jun 26, 2024	Autonomous VehiclesBenchmarking	—Unverified	0
Evaluating and Benchmarking Foundation Models for Earth Observation and Geospatial AI	Jun 26, 2024	BenchmarkingCrop Type Mapping	—Unverified	0
XLD: A Cross-Lane Dataset for Benchmarking Novel Driving View Synthesis	Jun 26, 2024	Autonomous DrivingBenchmarking	—Unverified	0
GenRL: Multimodal-foundation world models for generalization in embodied agents	Jun 26, 2024	BenchmarkingReinforcement Learning (RL)	CodeCode Available	2
MathOdyssey: Benchmarking Mathematical Problem-Solving Skills in Large Language Models Using Odyssey Math Data	Jun 26, 2024	BenchmarkingMath	CodeCode Available	2
RAGBench: Explainable Benchmark for Retrieval-Augmented Generation Systems	Jun 25, 2024	BenchmarkingRAG	—Unverified	0
Evaluating the Efficacy of Foundational Models: Advancing Benchmarking Practices to Enhance Fine-Tuning Decision-Making	Jun 25, 2024	BenchmarkingDecision Making	—Unverified	0
Depth-Driven Geometric Prompt Learning for Laparoscopic Liver Landmark Detection	Jun 25, 2024	BenchmarkingPrompt Learning	CodeCode Available	1
SoK: Membership Inference Attacks on LLMs are Rushing Nowhere (and How to Fix It)	Jun 25, 2024	BenchmarkingExperimental Design	CodeCode Available	1

Show:10 25 50

← PrevPage 195 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified