SOTAVerified

Benchmarking

Papers

Showing 10761100 of 5548 papers

TitleStatusHype
German's Next Language ModelCode1
Benchmarking Robustness of 3D Object Detection to Common CorruptionsCode1
Benchmarking Retrieval-Augmented Multimomal Generation for Document Question AnsweringCode1
Generalizable deep learning for photoplethysmography-based blood pressure estimation -- A Benchmarking StudyCode1
A Review and Efficient Implementation of Scene Graph Generation MetricsCode1
GenBench: A Benchmarking Suite for Systematic Evaluation of Genomic Foundation ModelsCode1
Benchmarking Relief-Based Feature Selection Methods for Bioinformatics Data MiningCode1
2.5D Visual Relationship DetectionCode1
General Binding Affinity Guidance for Diffusion Models in Structure-Based Drug DesignCode1
Generating a Doppelganger Graph: Resembling but DistinctCode1
GeSS: Benchmarking Geometric Deep Learning under Scientific Applications with Distribution ShiftsCode1
Benchmarking Recommendation, Classification, and Tracing Based on Hugging Face Knowledge GraphCode1
GEMv2: Multilingual NLG Benchmarking in a Single Line of CodeCode1
GAMA: a General Automated Machine learning AssistantCode1
GastroVision: A Multi-class Endoscopy Image Dataset for Computer Aided Gastrointestinal Disease DetectionCode1
G4SATBench: Benchmarking and Advancing SAT Solving with Graph Neural NetworksCode1
Benchmarking Quantized Neural Networks on FPGAs with FINNCode1
GADBench: Revisiting and Benchmarking Supervised Graph Anomaly DetectionCode1
GCondenser: Benchmarking Graph CondensationCode1
Benchmarking emergency department triage prediction models with machine learning and large public electronic health recordsCode1
FTNet: Feature Transverse Network for Thermal Image Semantic SegmentationCode1
Benchmarking Large Language Models on CMExam -- A Comprehensive Chinese Medical Exam DatasetCode1
Benchmarking Large Multimodal Models against Common CorruptionsCode1
African or European Swallow? Benchmarking Large Vision-Language Models for Fine-Grained Object ClassificationCode1
FullFront: Benchmarking MLLMs Across the Full Front-End Engineering WorkflowCode1
Show:102550
← PrevPage 44 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified