SOTAVerified

Benchmarking

Papers

Showing 926950 of 5548 papers

TitleStatusHype
Towards quantitative precision for ECG analysis: Leveraging state space models, self-supervision and patient metadataCode1
MLLM-DataEngine: An Iterative Refinement Approach for MLLMCode1
LLMRec: Benchmarking Large Language Models on Recommendation TaskCode1
VI-Net: Boosting Category-level 6D Object Pose Estimation via Learning Decoupled Rotations on the Spherical RepresentationsCode1
Benchmarking Neural Network Generalization for Grammar InductionCode1
Benchmarking Generated Poses: How Rational is Structure-based Drug Design with Generative Models?Code1
DIG In: Evaluating Disparities in Image Generations with Indicators for Geographic DiversityCode1
A Comparative Visual Analytics Framework for Evaluating Evolutionary Processes in Multi-objective OptimizationCode1
LLMeBench: A Flexible Framework for Accelerating LLMs BenchmarkingCode1
Application-Oriented Benchmarking of Quantum Generative Learning Using QUARKCode1
XFlow: Benchmarking Flow Behaviors over GraphsCode1
qgym: A Gym for Training and Benchmarking RL-Based Quantum CompilationCode1
Benchmarking and Analyzing Robust Point Cloud Recognition: Bag of Tricks for Defending Adversarial ExamplesCode1
VG-SSL: Benchmarking Self-supervised Representation Learning Approaches for Visual Geo-localizationCode1
Rethinking Uncertainly Missing and Ambiguous Visual Modality in Multi-Modal Entity AlignmentCode1
Benchmarking Offline Reinforcement Learning on Real-Robot HardwareCode1
PLANTAIN: Diffusion-inspired Pose Score Minimization for Fast and Accurate Molecular DockingCode1
JoinGym: An Efficient Query Optimization Environment for Reinforcement LearningCode1
SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language ModelsCode1
Decoding the Enigma: Benchmarking Humans and AIs on the Many Facets of Working MemoryCode1
Examining the Effects of Degree Distribution and Homophily in Graph Learning ModelsCode1
Efficient Prediction of Peptide Self-assembly through Sequential and Graphical EncodingCode1
Towards Heterogeneous Long-tailed Learning: Benchmarking, Metrics, and ToolboxCode1
GastroVision: A Multi-class Endoscopy Image Dataset for Computer Aided Gastrointestinal Disease DetectionCode1
IntelliGraphs: Datasets for Benchmarking Knowledge Graph GenerationCode1
Show:102550
← PrevPage 38 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified