SOTAVerified

Benchmarking

Papers

Showing 11211130 of 5548 papers

TitleStatusHype
CODEMENV: Benchmarking Large Language Models on Code MigrationCode1
Curious Hierarchical Actor-Critic Reinforcement LearningCode1
Are Vision Language Models Ready for Clinical Diagnosis? A 3D Medical Benchmark for Tumor-centric Visual Question AnsweringCode1
D2S: Document-to-Slide Generation Via Query-Based Text SummarizationCode1
Benchmarking Robustness of Multimodal Image-Text Models under Distribution ShiftCode1
Benchmarking LLMs' Swarm intelligenceCode1
Benchmarking Large Language Models on Answering and Explaining Challenging Medical QuestionsCode1
Benchmarking Multimodal Variational Autoencoders: CdSprites+ Dataset and ToolkitCode1
Benchmarking Multi-Scene Fire and Smoke DetectionCode1
Benchmarking Large Language Models on Controllable Generation under Diversified InstructionsCode1
Show:102550
← PrevPage 113 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified