SOTAVerified|Agents Browse Leaderboard About

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 611–620 of 5548 papers

Title	Date	Tasks	Status	Hype
DivScene: Benchmarking LVLMs for Object Navigation with Diverse Scenes and Objects	Oct 3, 2024	BenchmarkingImitation Learning	CodeCode Available	1
LLM-Pilot: Characterize and Optimize Performance of your LLM Inference Services	Oct 3, 2024	BenchmarkingGPU	CodeCode Available	1
StringLLM: Understanding the String Processing Capability of Large Language Models	Oct 2, 2024	Benchmarking	CodeCode Available	1
MONICA: Benchmarking on Long-tailed Medical Image Classification	Oct 2, 2024	BenchmarkingClassification	CodeCode Available	1
MedQA-CS: Benchmarking Large Language Models Clinical Skills Using an AI-SCE Framework	Oct 2, 2024	BenchmarkingInstruction Following	CodeCode Available	1
Exploring QUIC Dynamics: A Large-Scale Dataset for Encrypted Traffic Analysis	Sep 30, 2024	BenchmarkingIntrusion Detection	CodeCode Available	1
ARLBench: Flexible and Efficient Benchmarking for Hyperparameter Optimization in Reinforcement Learning	Sep 27, 2024	AutoMLBenchmarking	CodeCode Available	1
MALPOLON: A Framework for Deep Species Distribution Modeling	Sep 26, 2024	BenchmarkingGPU	CodeCode Available	1
HazeSpace2M: A Dataset for Haze Aware Single Image Dehazing	Sep 25, 2024	BenchmarkingImage Dehazing	CodeCode Available	1
RMCBench: Benchmarking Large Language Models' Resistance to Malicious Code	Sep 23, 2024	BenchmarkingCode Generation	CodeCode Available	1

Show:10 25 50

← PrevPage 62 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified