SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3121–3130 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
Intuitive or Dependent? Investigating LLMs' Behavior Style to Conflicting Prompts	Sep 29, 2023	BenchmarkingDecision Making	—Unverified	0	0
InverseBench: Benchmarking Plug-and-Play Diffusion Priors for Inverse Problems in Physical Sciences	Mar 14, 2025	BenchmarkingImage Restoration	—Unverified	0	0
A Framework for Benchmarking and Aligning Task-Planning Safety in LLM-Based Embodied Agents	Apr 20, 2025	BenchmarkingTask Planning	—Unverified	0	0
Investigating Deep-Learning NLP for Automating the Extraction of Oncology Efficacy Endpoints from Scientific Literature	Nov 3, 2023	Benchmarking	—Unverified	0	0
Investigating Energy Efficiency and Performance Trade-offs in LLM Inference Across Tasks and DVFS Settings	Jan 14, 2025	BenchmarkingQuestion Answering	—Unverified	0	0
The Russian practice of applying cluster approach in regional development	Jun 8, 2021	Benchmarking	—Unverified	0	0
Investigating the Robustness and Properties of Detection Transformers (DETR) Toward Difficult Images	Oct 12, 2023	BenchmarkingDecoder	—Unverified	0	0
Benchmarking Robustness of Adaptation Methods on Pre-trained Vision-Language Models	Jun 3, 2023	Benchmarking	—Unverified	0	0
Investigating the Vision Transformer Model for Image Retrieval Tasks	Jan 11, 2021	BenchmarkingImage Retrieval	—Unverified	0	0
Benchmarking Robustness in Neural Radiance Fields	Jan 10, 2023	BenchmarkingCamera Calibration	—Unverified	0	0

Show:10 25 50

← PrevPage 313 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified