SOTAVerified

Benchmarking

Papers

Showing 531540 of 5548 papers

TitleStatusHype
An Extended Benchmarking of Multi-Agent Reinforcement Learning Algorithms in Complex Fully Cooperative TasksCode1
Large Language Models for Multi-Robot Systems: A SurveyCode1
PICBench: Benchmarking LLMs for Photonic Integrated Circuits DesignCode1
MM-IQ: Benchmarking Human-Like Abstraction and Reasoning in Multimodal ModelsCode1
HateBench: Benchmarking Hate Speech Detectors on LLM-Generated Content and Hate CampaignsCode1
Enhancing Biomedical Relation Extraction with DirectionalityCode1
InsQABench: Benchmarking Chinese Insurance Domain Question Answering with Large Language ModelsCode1
Multimodal LLMs Can Reason about Aesthetics in Zero-ShotCode1
ToMATO: Verbalizing the Mental States of Role-Playing LLMs for Benchmarking Theory of MindCode1
TimberVision: A Multi-Task Dataset and Framework for Log-Component Segmentation and Tracking in Autonomous Forestry OperationsCode1
Show:102550
← PrevPage 54 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified