SOTAVerified

Benchmarking

Papers

Showing 901950 of 5548 papers

TitleStatusHype
FedAIoT: A Federated Learning Benchmark for Artificial Intelligence of ThingsCode1
Benchmarking Cognitive Biases in Large Language Models as EvaluatorsCode1
MuSe-GNN: Learning Unified Gene Representation From Multimodal Biological Graph DataCode1
Benchmarking the Abilities of Large Language Models for RDF Knowledge Graph Creation and Comprehension: How Well Do LLMs Speak Turtle?Code1
Revisiting Neural Program Smoothing for FuzzingCode1
The Trickle-down Impact of Reward (In-)consistency on RLHFCode1
LagrangeBench: A Lagrangian Fluid Mechanics Benchmarking SuiteCode1
FORB: A Flat Object Retrieval Benchmark for Universal Image EmbeddingCode1
OceanBench: The Sea Surface Height EditionCode1
Unified Long-Term Time-Series Forecasting BenchmarkCode1
NLPBench: Evaluating Large Language Models on Solving NLP ProblemsCode1
Node-Aligned Graph-to-Graph (NAG2G): Elevating Template-Free Deep Learning Approaches in Single-Step RetrosynthesisCode1
Benchmarking Local Robustness of High-Accuracy Binary Neural Networks for Enhanced Traffic Sign RecognitionCode1
Benchmarking Encoder-Decoder Architectures for Biplanar X-ray to 3D Shape ReconstructionCode1
Grad DFT: a software library for machine learning enhanced density functional theoryCode1
Prompt Tuned Embedding Classification for Multi-Label Industry Sector AllocationCode1
An Image Dataset for Benchmarking Recommender Systems with Raw PixelsCode1
Formalizing Multimedia Recommendation through Multimodal Deep LearningCode1
FreeMan: Towards Benchmarking 3D Human Pose Estimation under Real-World ConditionsCode1
RecAD: Towards A Unified Library for Recommender Attack and DefenseCode1
Evaluation of large language models for discovery of gene set functionCode1
A skeletonization algorithm for gradient-based optimizationCode1
Benchmarking Autoregressive Conditional Diffusion Models for Turbulent Flow SimulationCode1
Developing a Scalable Benchmark for Assessing Large Language Models in Knowledge Graph EngineeringCode1
Benchmarking the Generation of Fact Checking ExplanationsCode1
Towards quantitative precision for ECG analysis: Leveraging state space models, self-supervision and patient metadataCode1
MLLM-DataEngine: An Iterative Refinement Approach for MLLMCode1
LLMRec: Benchmarking Large Language Models on Recommendation TaskCode1
VI-Net: Boosting Category-level 6D Object Pose Estimation via Learning Decoupled Rotations on the Spherical RepresentationsCode1
Benchmarking Neural Network Generalization for Grammar InductionCode1
Benchmarking Generated Poses: How Rational is Structure-based Drug Design with Generative Models?Code1
DIG In: Evaluating Disparities in Image Generations with Indicators for Geographic DiversityCode1
A Comparative Visual Analytics Framework for Evaluating Evolutionary Processes in Multi-objective OptimizationCode1
LLMeBench: A Flexible Framework for Accelerating LLMs BenchmarkingCode1
Application-Oriented Benchmarking of Quantum Generative Learning Using QUARKCode1
XFlow: Benchmarking Flow Behaviors over GraphsCode1
qgym: A Gym for Training and Benchmarking RL-Based Quantum CompilationCode1
Benchmarking and Analyzing Robust Point Cloud Recognition: Bag of Tricks for Defending Adversarial ExamplesCode1
VG-SSL: Benchmarking Self-supervised Representation Learning Approaches for Visual Geo-localizationCode1
Rethinking Uncertainly Missing and Ambiguous Visual Modality in Multi-Modal Entity AlignmentCode1
Benchmarking Offline Reinforcement Learning on Real-Robot HardwareCode1
PLANTAIN: Diffusion-inspired Pose Score Minimization for Fast and Accurate Molecular DockingCode1
JoinGym: An Efficient Query Optimization Environment for Reinforcement LearningCode1
SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language ModelsCode1
Decoding the Enigma: Benchmarking Humans and AIs on the Many Facets of Working MemoryCode1
Examining the Effects of Degree Distribution and Homophily in Graph Learning ModelsCode1
Efficient Prediction of Peptide Self-assembly through Sequential and Graphical EncodingCode1
Towards Heterogeneous Long-tailed Learning: Benchmarking, Metrics, and ToolboxCode1
GastroVision: A Multi-class Endoscopy Image Dataset for Computer Aided Gastrointestinal Disease DetectionCode1
IntelliGraphs: Datasets for Benchmarking Knowledge Graph GenerationCode1
Show:102550
← PrevPage 19 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified