SOTAVerified|Agents Browse Leaderboard About

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 671–680 of 5548 papers

Title	Date	Tasks	Status	Hype
Benchmark on Drug Target Interaction Modeling from a Structure Perspective	Jul 4, 2024	BenchmarkingDrug Discovery	CodeCode Available	1
GraCoRe: Benchmarking Graph Comprehension and Complex Reasoning in Large Language Models	Jul 3, 2024	Benchmarking	CodeCode Available	1
Comics Datasets Framework: Mix of Comics datasets for detection benchmarking	Jul 3, 2024	BenchmarkingObject	CodeCode Available	1
Emotion and Intent Joint Understanding in Multimodal Conversation: A Benchmarking Dataset	Jul 3, 2024	BenchmarkingDiversity	CodeCode Available	1
Occlusion-Aware Seamless Segmentation	Jul 2, 2024	BenchmarkingDomain Adaptation	CodeCode Available	1
FineSurE: Fine-grained Summarization Evaluation using LLMs	Jul 1, 2024	BenchmarkingHallucination	CodeCode Available	1
Overcoming Common Flaws in the Evaluation of Selective Classification Systems	Jul 1, 2024	BenchmarkingClassification	CodeCode Available	1
Mobile-Bench: An Evaluation Benchmark for LLM-based Mobile Agents	Jul 1, 2024	Benchmarking	CodeCode Available	1
AI Agents That Matter	Jul 1, 2024	Benchmarking	CodeCode Available	1
GraphArena: Benchmarking Large Language Models on Graph Computational Problems	Jun 29, 2024	BenchmarkingHallucination	CodeCode Available	1

Show:10 25 50

← PrevPage 68 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified