SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1501–1510 of 5548 papers

Title	Date	Tasks	Status	Hype
When Graph meets Multimodal: Benchmarking on Multimodal Attributed Graphs Learning	Oct 11, 2024	AttributeBenchmarking	CodeCode Available	1
Test-driven Software Experimentation with LASSO: an LLM Prompt Benchmarking Example	Oct 11, 2024	BenchmarkingCode Generation	—Unverified	0
Can we hop in general? A discussion of benchmark selection and design using the Hopper environment	Oct 11, 2024	BenchmarkingReinforcement Learning (RL)	—Unverified	0
uto\!L: Autonomous Evaluation of LLMs for Truth Maintenance and Reasoning Tasks	Oct 11, 2024	BenchmarkingLanguage Modeling	—Unverified	0
Guidelines for Fine-grained Sentence-level Arabic Readability Annotation	Oct 11, 2024	BenchmarkingSentence	—Unverified	0
Cross-Modal Bidirectional Interaction Model for Referring Remote Sensing Image Segmentation	Oct 11, 2024	BenchmarkingImage Segmentation	CodeCode Available	1
TRIAGE: Ethical Benchmarking of AI Models Through Mass Casualty Simulations	Oct 10, 2024	BenchmarkingDecision Making	CodeCode Available	0
Identifying Money Laundering Subgraphs on the Blockchain	Oct 10, 2024	Benchmarking	CodeCode Available	0
COMPL-AI Framework: A Technical Interpretation and LLM Benchmarking Suite for the EU Artificial Intelligence Act	Oct 10, 2024	BenchmarkingFairness	CodeCode Available	2
Benchmarking Agentic Workflow Generation	Oct 10, 2024	Benchmarking	CodeCode Available	2

Show:10 25 50

← PrevPage 151 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified