SOTAVerified

Benchmarking

Papers

Showing 801850 of 5548 papers

TitleStatusHype
CodeIF: Benchmarking the Instruction-Following Capabilities of Large Language Models for Code GenerationCode1
CODEBench: A Neural Architecture and Hardware Accelerator Co-Design FrameworkCode1
A Comprehensive Overview of Large Language ModelsCode1
GEMv2: Multilingual NLG Benchmarking in a Single Line of CodeCode1
Benchmarking AI scientists in omics data-driven biological researchCode1
CODEMENV: Benchmarking Large Language Models on Code MigrationCode1
A Dataset for Answering Time-Sensitive QuestionsCode1
Benchmarking Algorithms for Federated Domain GeneralizationCode1
Benchmarking Algorithms for Submodular Optimization Problems Using IOHProfilerCode1
Generating a Doppelganger Graph: Resembling but DistinctCode1
A Benchmarking Study of Embedding-based Entity Alignment for Knowledge GraphsCode1
Generative Evaluation of Complex Reasoning in Large Language ModelsCode1
Benchmarking and Analysis of Unsupervised Object Segmentation from Real-world Single ImagesCode1
Benchmarking and Analyzing 3D-aware Image Synthesis with a Modularized CodebaseCode1
Benchmarking and Analyzing 3D Human Pose and Shape Estimation Beyond AlgorithmsCode1
A Benchmarking Study of Kolmogorov-Arnold Networks on Tabular DataCode1
AirSim Drone Racing LabCode1
A SWAT-based Reinforcement Learning Framework for Crop ManagementCode1
Benchmarking and Analyzing Robust Point Cloud Recognition: Bag of Tricks for Defending Adversarial ExamplesCode1
Geoclidean: Few-Shot Generalization in Euclidean GeometryCode1
COCO: The Large Scale Black-Box Optimization Benchmarking (bbob-largescale) Test SuiteCode1
GLGENN: A Novel Parameter-Light Equivariant Neural Networks Architecture Based on Clifford Geometric AlgebrasCode1
Benchmarking Deep Learning Interpretability in Time Series PredictionsCode1
Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language ModelsCode1
Codabench: Flexible, Easy-to-Use and Reproducible Benchmarking PlatformCode1
CodeReef: an open platform for portable MLOps, reusable automation actions and reproducible benchmarkingCode1
Grad DFT: a software library for machine learning enhanced density functional theoryCode1
GraphArena: Benchmarking Large Language Models on Graph Computational ProblemsCode1
Graph Neural Network-Based Anomaly Detection for River Network SystemsCode1
Benchmarking and Explaining Large Language Model-based Code Generation: A Causality-Centric ApproachCode1
CommonPower: A Framework for Safe Data-Driven Smart Grid ControlCode1
Replication in Visual Diffusion Models: A Survey and OutlookCode1
DFGC 2021: A DeepFake Game CompetitionCode1
ClimART: A Benchmark Dataset for Emulating Atmospheric Radiative Transfer in Weather and Climate ModelsCode1
Benchmarking and Improving Large Vision-Language Models for Fundamental Visual Graph Understanding and ReasoningCode1
Benchmarking Generated Poses: How Rational is Structure-based Drug Design with Generative Models?Code1
4D Panoptic LiDAR SegmentationCode1
Clinical Prompt Learning with Frozen Language ModelsCode1
Large Scale MRI Collection and Segmentation of Cirrhotic LiverCode1
Benchmarking of DL Libraries and Models on Mobile DevicesCode1
Towards Heterogeneous Long-tailed Learning: Benchmarking, Metrics, and ToolboxCode1
Circumventing shortcuts in audio-visual deepfake detection datasets with unsupervised learningCode1
A BFS-Tree of Ranking References for Unsupervised Manifold LearningCode1
Benchmarking and Survey of Explanation Methods for Black Box ModelsCode1
An Empirical Study into Clustering of Unseen Datasets with Self-Supervised EncodersCode1
Benchmarking Geospatial Question Answering Engines using the Dataset GeoQuestions1089Code1
ClearPose: Large-scale Transparent Object Dataset and BenchmarkCode1
CLoG: Benchmarking Continual Learning of Image Generation ModelsCode1
A Survey on Graph Counterfactual Explanations: Definitions, Methods, Evaluation, and Research ChallengesCode1
AIPerf: Automated machine learning as an AI-HPC benchmarkCode1
Show:102550
← PrevPage 17 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified