SOTAVerified|Agents Browse Leaderboard About

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1271–1280 of 5548 papers

Title	Date	Tasks	Status	Hype
Agentic-HLS: An agentic reasoning based high-level synthesis system using large language models (AI for EDA workshop 2024)	Dec 2, 2024	BenchmarkingHigh-Level Synthesis	CodeCode Available	0
TextClass Benchmark: A Continuous Elo Rating of LLMs in Social Sciences	Nov 30, 2024	BenchmarkingClassification	CodeCode Available	0
Circumventing shortcuts in audio-visual deepfake detection datasets with unsupervised learning	Nov 29, 2024	BenchmarkingDeepFake Detection	CodeCode Available	1
One-Shot Real-to-Sim via End-to-End Differentiable Simulation and Rendering	Nov 29, 2024	BenchmarkingObject	—Unverified	0
Truth or Mirage? Towards End-to-End Factuality Evaluation with LLM-Oasis	Nov 29, 2024	BenchmarkingClaim Verification	CodeCode Available	1
Perception Test 2024: Challenge Summary and a Novel Hour-Long VideoQA Benchmark	Nov 29, 2024	BenchmarkingGrounded Video Question Answering	—Unverified	0
OpenQDC: Open Quantum Data Commons	Nov 29, 2024	Benchmarking	CodeCode Available	2
λ: A Benchmark for Data-Efficiency in Long-Horizon Indoor Mobile Manipulation Robotics	Nov 28, 2024	BenchmarkingDiversity	—Unverified	0
GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks	Nov 28, 2024	BenchmarkingObject Counting	CodeCode Available	2
Consolidating and Developing Benchmarking Datasets for the Nepali Natural Language Understanding Tasks	Nov 28, 2024	BenchmarkingNatural Language Inference	—Unverified	0

Show:10 25 50

← PrevPage 128 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified