SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1571–1580 of 5548 papers

Title	Date	Tasks	Status	Hype
shapiq: Shapley Interactions for Machine Learning	Oct 2, 2024	BenchmarkingData Valuation	CodeCode Available	4
The Labyrinth of Links: Navigating the Associative Maze of Multi-modal LLMs	Oct 2, 2024	BenchmarkingHallucination	—Unverified	0
Deep learning for action spotting in association football videos	Oct 2, 2024	Action SpottingBenchmarking	—Unverified	0
Benchmarking Large Language Models for Conversational Question Answering in Multi-instructional Documents	Oct 1, 2024	BenchmarkingConversational Question Answering	—Unverified	0
FMBench: Benchmarking Fairness in Multimodal Large Language Models on Medical Tasks	Oct 1, 2024	BenchmarkingFairness	—Unverified	0
CXPMRG-Bench: Pre-training and Benchmarking for X-ray Medical Report Generation on CheXpert Plus Dataset	Oct 1, 2024	BenchmarkingContrastive Learning	—Unverified	0
Exploring QUIC Dynamics: A Large-Scale Dataset for Encrypted Traffic Analysis	Sep 30, 2024	BenchmarkingIntrusion Detection	CodeCode Available	1
ImmersePro: End-to-End Stereo Video Synthesis Via Implicit Disparity Learning	Sep 30, 2024	BenchmarkingDisparity Estimation	CodeCode Available	0
Benchmarking Adaptive Intelligence and Computer Vision on Human-Robot Collaboration	Sep 30, 2024	BenchmarkingIntent Detection	—Unverified	0
Q-Bench-Video: Benchmarking the Video Quality Understanding of LMMs	Sep 30, 2024	BenchmarkingMultiple-choice	—Unverified	0

Show:10 25 50

← PrevPage 158 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified