SOTAVerified

Benchmarking

Papers

Showing 461470 of 5548 papers

TitleStatusHype
PyTDC: A multimodal machine learning training, evaluation, and inference platform for biomedical foundation modelsCode1
Benchmarking Vision, Language, & Action Models in Procedurally Generated, Open Ended Action EnvironmentsCode1
Benchmarking LLMs' Swarm intelligenceCode1
Benchmarking LLM Faithfulness in RAG with Evolving LeaderboardsCode1
RGB-Event Fusion with Self-Attention for Collision PredictionCode1
CombiBench: Benchmarking LLM Capability for Combinatorial MathematicsCode1
RTV-Bench: Benchmarking MLLM Continuous Perception, Understanding and Reasoning through Real-Time VideoCode1
GEOM-Drugs Revisited: Toward More Chemically Accurate Benchmarks for 3D Molecule GenerationCode1
OSVBench: Benchmarking LLMs on Specification Generation Tasks for Operating System VerificationCode1
TrueFake: A Real World Case Dataset of Last Generation Fake Images also Shared on Social NetworksCode1
Show:102550
← PrevPage 47 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified