SOTAVerified

Benchmarking

Papers

Showing 29513000 of 5548 papers

TitleStatusHype
Demographic Parity: Mitigating Biases in Real-World Data0
NLPBench: Evaluating Large Language Models on Solving NLP ProblemsCode1
A Content-Driven Micro-Video Recommendation Dataset at ScaleCode2
Unified Long-Term Time-Series Forecasting BenchmarkCode1
Node-Aligned Graph-to-Graph (NAG2G): Elevating Template-Free Deep Learning Approaches in Single-Step RetrosynthesisCode1
Advancing The Rate-Distortion-Computation Frontier For Neural Image Compression0
A Toolkit for Reliable Benchmarking and Research in Multi-Objective Reinforcement LearningCode2
Thalamic nuclei segmentation from T_1-weighted MRI: unifying and benchmarking state-of-the-art methods with young and old cohorts0
On quantifying and improving realism of images generated with diffusion0
Optimization Techniques for a Physical Model of Human Vocalisation0
Benchmarking Local Robustness of High-Accuracy Binary Neural Networks for Enhanced Traffic Sign RecognitionCode1
Efficient Pauli channel estimation with logarithmic quantum memory0
Machine-assisted quantitizing designs: augmenting humanities and social sciences with artificial intelligenceCode0
Categorization and analysis of 14 computational methods for estimating cell potency from single-cell RNA-seq data0
Benchmarking Encoder-Decoder Architectures for Biplanar X-ray to 3D Shape ReconstructionCode1
VisionKG: Unleashing the Power of Visual Datasets via Knowledge Graph0
Grad DFT: a software library for machine learning enhanced density functional theoryCode1
Turbulence in Focus: Benchmarking Scaling Behavior of 3D Volumetric Super-Resolution with BLASTNet 2.0 Data0
Domain Adaptation for Arabic Machine Translation: The Case of Financial Texts0
Benchmarking quantized LLaMa-based models on the Brazilian Secondary School Exam0
Prompt Tuned Embedding Classification for Multi-Label Industry Sector AllocationCode1
Multimodal Deep Learning for Scientific Imaging Interpretation0
On the relationship between Benchmarking, Standards and Certification in Robotics and AI0
Towards Effective Disambiguation for Machine Translation with Large Language Models0
An Evaluation of Machine Learning Approaches for Early Diagnosis of Autism Spectrum DisorderCode0
Training neural mapping schemes for satellite altimetry with simulation data0
SHOWMe: Benchmarking Object-agnostic Hand-Object 3D Reconstruction0
The Protein Engineering Tournament: An Open Science Benchmark for Protein Modeling and Design0
Emerging Approaches for THz Array Imaging: A Tutorial Review and Software Tool0
Exploration of TPUs for AI Applications0
Anchor Points: Benchmarking Models with Much Fewer ExamplesCode0
M3Dsynth: A dataset of medical 3D images with AI-generated local manipulationsCode0
Leveraging Contextual Information for Effective Entity Salience Detection0
Benchmarking machine learning models for quantum state classification0
VerilogEval: Evaluating Large Language Models for Verilog Code GenerationCode2
So you think you can track?0
Benchmarking Procedural Language Understanding for Low-Resource Languages: A Case Study on TurkishCode0
An Image Dataset for Benchmarking Recommender Systems with Raw PixelsCode1
AmodalSynthDrive: A Synthetic Amodal Perception Dataset for Autonomous Driving0
Unveiling the potential of large language models in generating semantic and cross-language clones0
Formalizing Multimedia Recommendation through Multimodal Deep LearningCode1
FreeMan: Towards Benchmarking 3D Human Pose Estimation under Real-World ConditionsCode1
RecAD: Towards A Unified Library for Recommender Attack and DefenseCode1
Navigating Out-of-Distribution Electricity Load Forecasting during COVID-19: Benchmarking energy load forecasting models without and with continual learningCode0
DBsurf: A Discrepancy Based Method for Discrete Stochastic Gradient Estimation0
PyGraft: Configurable Generation of Synthetic Schemas and Knowledge Graphs at Your FingertipsCode2
Using representation balancing to learn conditional-average dose responses from clustered dataCode0
Better Practices for Domain Adaptation0
Evaluation of large language models for discovery of gene set functionCode1
Neural Networks for Fast Optimisation in Model Predictive Control: A Review0
Show:102550
← PrevPage 60 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified