SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2691–2700 of 5548 papers

Title	Date	Tasks	Status	Hype
SPOC: Imitating Shortest Paths in Simulation Enables Effective Navigation and Manipulation in the Real World	Dec 5, 2023	BenchmarkingDiversity	—Unverified	0
BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal Models	Dec 5, 2023	BenchmarkingVisual Question Answering	CodeCode Available	1
BEDD: The MineRL BASALT Evaluation and Demonstrations Dataset for Training and Benchmarking Agents that Solve Fuzzy Tasks	Dec 5, 2023	BenchmarkingMinecraft	CodeCode Available	1
Let the LLMs Talk: Simulating Human-to-Human Conversational QA via Zero-Shot LLM-to-LLM Interactions	Dec 5, 2023	BenchmarkingConversational Question Answering	CodeCode Available	1
Contrastive Learning-Based Spectral Knowledge Distillation for Multi-Modality and Missing Modality Scenarios in Semantic Segmentation	Dec 4, 2023	BenchmarkingContrastive Learning	—Unverified	0
BenchMARL: Benchmarking Multi-Agent Reinforcement Learning	Dec 3, 2023	BenchmarkingMulti-agent Reinforcement Learning	—Unverified	0
An Empirical Study of Automated Mislabel Detection in Real World Vision Datasets	Dec 2, 2023	Benchmarking	—Unverified	0
Evetac: An Event-based Optical Tactile Sensor for Robotic Manipulation	Dec 2, 2023	Benchmarking	—Unverified	0
Analyzing the Impact of Fake News on the Anticipated Outcome of the 2024 Election Ahead of Time	Dec 1, 2023	ArticlesBenchmarking	—Unverified	0
Identifying patterns and recommendations of and for sustainable open data initiatives: a benchmarking-driven analysis of open government data initiatives among European countries	Dec 1, 2023	Benchmarking	—Unverified	0

Show:10 25 50

← PrevPage 270 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified