SOTAVerified|Agents Browse Leaderboard About

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 931–940 of 5548 papers

Title	Date	Tasks	Status	Hype
Defining and Evaluating Visual Language Models' Basic Spatial Abilities: A Perspective from Psychometrics	Feb 17, 2025	BenchmarkingDiagnostic	—Unverified	0
Ansatz-free Hamiltonian learning with Heisenberg-limited scaling	Feb 17, 2025	Benchmarking	—Unverified	0
JExplore: Design Space Exploration Tool for Nvidia Jetson Boards	Feb 16, 2025	BenchmarkingGPU	CodeCode Available	0
TituLLMs: A Family of Bangla LLMs with Comprehensive Benchmarking	Feb 16, 2025	Benchmarking	—Unverified	0
Can't See the Forest for the Trees: Benchmarking Multimodal Safety Awareness for Multimodal LLMs	Feb 16, 2025	Benchmarking	—Unverified	0
Yesil o1 Pro: Evidence-Based AI Model for Health and Benchmarking in Clinical Decision Support	Feb 15, 2025	BenchmarkingEpidemiology	—Unverified	0
User Profile with Large Language Models: Construction, Updating, and Benchmarking	Feb 15, 2025	BenchmarkingProfile Generation	—Unverified	0
Generalized Attention Flow: Feature Attribution for Transformer Models via Maximum Flow	Feb 14, 2025	Benchmarking	—Unverified	0
MIR-Bench: Can Your LLM Recognize Complicated Patterns via Many-Shot In-Context Reasoning?	Feb 14, 2025	BenchmarkingIn-Context Learning	—Unverified	0
LaRA: Benchmarking Retrieval-Augmented Generation and Long-Context LLMs - No Silver Bullet for LC or RAG Routing	Feb 14, 2025	BenchmarkingRAG	CodeCode Available	0

Show:10 25 50

← PrevPage 94 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified