SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2871–2880 of 5548 papers

Title	Date	Tasks	Status	Hype
A Benchmarking Protocol for SAR Colorization: From Regression to Deep Learning Approaches	Oct 12, 2023	BenchmarkingColorization	—Unverified	0
Investigating the Robustness and Properties of Detection Transformers (DETR) Toward Difficult Images	Oct 12, 2023	BenchmarkingDecoder	—Unverified	0
Who Said That? Benchmarking Social Media AI Detection	Oct 12, 2023	BenchmarkingMisinformation	—Unverified	0
Towards Evaluating Generalist Agents: An Automated Benchmark in Open World	Oct 12, 2023	BenchmarkingDiversity	CodeCode Available	1
Octopus: Embodied Vision-Language Programmer from Environmental Feedback	Oct 12, 2023	BenchmarkingCode Generation	CodeCode Available	2
CRITERIA: a New Benchmarking Paradigm for Evaluating Trajectory Prediction Models for Autonomous Driving	Oct 11, 2023	Autonomous DrivingBenchmarking	CodeCode Available	3
Deep Reinforcement Learning for Autonomous Cyber Defence: A Survey	Oct 11, 2023	BenchmarkingDeep Reinforcement Learning	—Unverified	0
FedSym: Unleashing the Power of Entropy for Benchmarking the Algorithms for Federated Learning	Oct 11, 2023	BenchmarkingDiversity	—Unverified	0
Transformers for Green Semantic Communication: Less Energy, More Semantics	Oct 11, 2023	BenchmarkingCPU	CodeCode Available	0
Hypergraph Neural Networks through the Lens of Message Passing: A Common Perspective to Homophily and Architecture Design	Oct 11, 2023	BenchmarkingRepresentation Learning	—Unverified	0

Show:10 25 50

← PrevPage 288 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified