SOTAVerified

Benchmarking

Papers

Showing 751775 of 5548 papers

TitleStatusHype
Does BERT Learn as Humans Perceive? Understanding Linguistic Styles through LexicaCode1
Benchmarking structure-based three-dimensional molecular generative models using GenBench3D: ligand conformation quality mattersCode1
A Systematic Benchmarking Analysis of Transfer Learning for Medical Image AnalysisCode1
BeHonest: Benchmarking Honesty in Large Language ModelsCode1
Digital Typhoon: Long-term Satellite Image Dataset for the Spatio-Temporal Modeling of Tropical CyclonesCode1
Benchmarking Llama2, Mistral, Gemma and GPT for Factuality, Toxicity, Bias and Propensity for HallucinationsCode1
A Comprehensive Overview of Large Language ModelsCode1
DIG In: Evaluating Disparities in Image Generations with Indicators for Geographic DiversityCode1
Bench4KE: Benchmarking Automated Competency Question GenerationCode1
Benchmarking the Abilities of Large Language Models for RDF Knowledge Graph Creation and Comprehension: How Well Do LLMs Speak Turtle?Code1
A multi-schematic classifier-independent oversampling approach for imbalanced datasetsCode1
Benchmarking the Combinatorial Generalizability of Complex Query Answering on Knowledge GraphsCode1
AirSim Drone Racing LabCode1
Bencher: Simple and Reproducible Benchmarking for Black-Box OptimizationCode1
A SWAT-based Reinforcement Learning Framework for Crop ManagementCode1
BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal ModelsCode1
Benchmarking MRI Reconstruction Neural Networks on Large Public DatasetsCode1
Benchmarking Large Multimodal Models against Common CorruptionsCode1
Benchmarking Large Vision-Language Models via Directed Scene Graph for Comprehensive Image CaptioningCode1
Benchmarking the Robustness of Spatial-Temporal Models Against CorruptionsCode1
Disentangled Feature Representation for Few-shot Image ClassificationCode1
ForgeryNet: A Versatile Benchmark for Comprehensive Forgery AnalysisCode1
Does your model understand genes? A benchmark of gene properties for biological and text modelsCode1
Event-Free Moving Object Segmentation from Moving Ego VehicleCode1
ENRICH: Multi-purposE dataset for beNchmaRking In Computer vision and pHotogrammetryCode1
Show:102550
← PrevPage 31 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified