SOTAVerified

Benchmarking

Papers

Showing 11011110 of 5548 papers

TitleStatusHype
Benchmarking Large Vision-Language Models via Directed Scene Graph for Comprehensive Image CaptioningCode1
Benchmarking Relief-Based Feature Selection Methods for Bioinformatics Data MiningCode1
Benchmarking Robustness of Multimodal Image-Text Models under Distribution ShiftCode1
Benchmarking Large Language Models on Answering and Explaining Challenging Medical QuestionsCode1
Generating a Doppelganger Graph: Resembling but DistinctCode1
Benchmarking Retrieval-Augmented Multimomal Generation for Document Question AnsweringCode1
Benchmarking Segmentation Models with Mask-Preserved Attribute EditingCode1
Benchmarking Robustness of 3D Object Detection to Common CorruptionsCode1
Benchmarking Robustness of Machine Reading Comprehension ModelsCode1
Benchmarking Large Language Models on Controllable Generation under Diversified InstructionsCode1
Show:102550
← PrevPage 111 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified