SOTAVerified

Benchmarking

Papers

Showing 36513700 of 5548 papers

TitleStatusHype
Towards Effective Disambiguation for Machine Translation with Large Language Models0
An Evaluation of Machine Learning Approaches for Early Diagnosis of Autism Spectrum DisorderCode0
SHOWMe: Benchmarking Object-agnostic Hand-Object 3D Reconstruction0
Training neural mapping schemes for satellite altimetry with simulation data0
The Protein Engineering Tournament: An Open Science Benchmark for Protein Modeling and Design0
Exploration of TPUs for AI Applications0
Emerging Approaches for THz Array Imaging: A Tutorial Review and Software Tool0
Anchor Points: Benchmarking Models with Much Fewer ExamplesCode0
M3Dsynth: A dataset of medical 3D images with AI-generated local manipulationsCode0
Benchmarking machine learning models for quantum state classification0
Leveraging Contextual Information for Effective Entity Salience Detection0
So you think you can track?0
Benchmarking Procedural Language Understanding for Low-Resource Languages: A Case Study on TurkishCode0
Unveiling the potential of large language models in generating semantic and cross-language clones0
AmodalSynthDrive: A Synthetic Amodal Perception Dataset for Autonomous Driving0
Navigating Out-of-Distribution Electricity Load Forecasting during COVID-19: Benchmarking energy load forecasting models without and with continual learningCode0
DBsurf: A Discrepancy Based Method for Discrete Stochastic Gradient Estimation0
Better Practices for Domain Adaptation0
Using representation balancing to learn conditional-average dose responses from clustered dataCode0
Are SNNs Truly Energy-efficient? - A Hardware Perspective0
Neural Networks for Fast Optimisation in Model Predictive Control: A Review0
AGIBench: A Multi-granularity, Multimodal, Human-referenced, Auto-scoring Benchmark for Large Language Models0
A survey on efficient vision transformers: algorithms, techniques, and performance benchmarking0
Hybrid data driven/thermal simulation model for comfort assessment0
Transfer Learning between Motor Imagery Datasets using Deep Learning -- Validation of Framework and Comparison of DatasetsCode0
FOR-instance: a UAV laser scanning benchmark dataset for semantic and instance segmentation of individual trees0
Holistic Dynamic Frequency Transformer for Image Fusion and Exposure Correction0
FederatedScope-LLM: A Comprehensive Package for Fine-tuning Large Language Models in Federated Learning0
NeMig -- A Bilingual News Collection and Knowledge Graph about MigrationCode0
Can humans help BERT gain "confidence"?0
Benchmarking Robustness and Generalization in Multi-Agent Systems: A Case Study on Neural MMO0
Benchmarking Multilabel Topic Classification in the Kyrgyz LanguageCode0
Speech Self-Supervised Representations Benchmarking: a Case for Larger Probing Heads0
Benchmarking Data Efficiency and Computational Efficiency of Temporal Action Localization Models0
Beyond Document Page Classification: Design, Datasets, and ChallengesCode0
Finding the Perfect Fit: Applying Regression Models to ClimateBench v1.0Code0
Benchmarking Causal Study to Interpret Large Language Models for Source Code0
Efficient Benchmarking of Language Models0
Benchmarking Domain Adaptation for Chemical Processes on the Tennessee Eastman ProcessCode0
Beyond MD17: the reactive xxMD datasetCode0
Expecting The Unexpected: Towards Broad Out-Of-Distribution DetectionCode0
UGSL: A Unified Framework for Benchmarking Graph Structure Learning0
Measuring the Effect of Causal Disentanglement on the Adversarial Robustness of Neural Network Models0
Neurological Prognostication of Post-Cardiac-Arrest Coma Patients Using EEG Data: A Dynamic Survival Analysis Framework with Competing RisksCode0
Benchmarking Adversarial Robustness of Compressed Deep Learning Models0
A Survey on Model Compression for Large Language Models0
IoT Data Trust Evaluation via Machine LearningCode0
Benchmarking Scalable Epistemic Uncertainty Quantification in Organ SegmentationCode0
Deep Neural Operator Driven Real Time Inference for Nuclear Systems to Enable Digital Twin Solutions0
Does AI for science need another ImageNet Or totally different benchmarks? A case study of machine learning force fields0
Show:102550
← PrevPage 74 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified