SOTAVerified|Agents Browse Leaderboard About

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 591–600 of 5548 papers

Title	Date	Tasks	Status	Hype
SPICEPilot: Navigating SPICE Code Generation and Simulation with AI Guidance	Oct 27, 2024	BenchmarkingCode Generation	CodeCode Available	1
AgentSense: Benchmarking Social Intelligence of Language Agents through Interactive Scenarios	Oct 25, 2024	BenchmarkingDiversity	CodeCode Available	1
Benchmarking Multi-Scene Fire and Smoke Detection	Oct 22, 2024	Benchmarking	CodeCode Available	1
Comprehensive benchmarking of large language models for RNA secondary structure prediction	Oct 21, 2024	Benchmarking	CodeCode Available	1
MultiChartQA: Benchmarking Vision-Language Models on Multi-Chart Problems	Oct 18, 2024	BenchmarkingQuestion Answering	CodeCode Available	1
Benchmarking Deep Reinforcement Learning for Navigation in Denied Sensor Environments	Oct 18, 2024	Autonomous NavigationBenchmarking	CodeCode Available	1
Benchmarking Transcriptomics Foundation Models for Perturbation Analysis : one PCA still rules them all	Oct 17, 2024	AllBenchmarking	CodeCode Available	1
WorldMedQA-V: a multilingual, multimodal medical examination dataset for multimodal language models evaluation	Oct 16, 2024	BenchmarkingFairness	CodeCode Available	1
RClicks: Realistic Click Simulation for Benchmarking Interactive Segmentation	Oct 15, 2024	BenchmarkingInteractive Segmentation	CodeCode Available	1
TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models	Oct 14, 2024	2kBenchmarking	CodeCode Available	1

Show:10 25 50

← PrevPage 60 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified