SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2391–2400 of 5548 papers

Title	Date	Tasks	Status	Hype
Segmenting Maxillofacial Structures in CBCT Volumes	Jan 1, 2025	AnatomyBenchmarking	—Unverified	0
Sketchtopia: A Dataset and Foundational Agents for Benchmarking Asynchronous Multimodal Communication with Iconic Feedback	Jan 1, 2025	Benchmarking	—Unverified	0
InterAct: Advancing Large-Scale Versatile 3D Human-Object Interaction Generation	Jan 1, 2025	BenchmarkingHuman-Object Interaction Detection	—Unverified	0
SVLTA: Benchmarking Vision-Language Temporal Alignment via Synthetic Video Situation	Jan 1, 2025	BenchmarkingDiagnostic	—Unverified	0
On the Utility of Equivariance and Symmetry Breaking in Deep Learning Architectures on Point Clouds	Jan 1, 2025	Benchmarking	—Unverified	0
Geometry Matters: Benchmarking Scientific ML Approaches for Flow Prediction around Complex Geometries	Dec 31, 2024	BenchmarkingOut-of-Distribution Generalization	—Unverified	0
A review of faithfulness metrics for hallucination assessment in Large Language Models	Dec 31, 2024	BenchmarkingHallucination	—Unverified	0
AraSTEM: A Native Arabic Multiple Choice Question Benchmark for Evaluating LLMs Knowledge In STEM Subjects	Dec 31, 2024	BenchmarkingMultiple-choice	—Unverified	0
Measuring Large Language Models Capacity to Annotate Journalistic Sourcing	Dec 30, 2024	BenchmarkingEthics	—Unverified	0
SecBench: A Comprehensive Multi-Dimensional Benchmarking Dataset for LLMs in Cybersecurity	Dec 30, 2024	BenchmarkingCode Generation	—Unverified	0

Show:10 25 50

← PrevPage 240 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified