SOTAVerified

Benchmarking

Papers

Showing 26912700 of 5548 papers

TitleStatusHype
SPOC: Imitating Shortest Paths in Simulation Enables Effective Navigation and Manipulation in the Real World0
BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal ModelsCode1
BEDD: The MineRL BASALT Evaluation and Demonstrations Dataset for Training and Benchmarking Agents that Solve Fuzzy TasksCode1
Let the LLMs Talk: Simulating Human-to-Human Conversational QA via Zero-Shot LLM-to-LLM InteractionsCode1
Contrastive Learning-Based Spectral Knowledge Distillation for Multi-Modality and Missing Modality Scenarios in Semantic Segmentation0
BenchMARL: Benchmarking Multi-Agent Reinforcement Learning0
An Empirical Study of Automated Mislabel Detection in Real World Vision Datasets0
Evetac: An Event-based Optical Tactile Sensor for Robotic Manipulation0
Analyzing the Impact of Fake News on the Anticipated Outcome of the 2024 Election Ahead of Time0
Identifying patterns and recommendations of and for sustainable open data initiatives: a benchmarking-driven analysis of open government data initiatives among European countries0
Show:102550
← PrevPage 270 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified