SOTAVerified

Benchmarking

Papers

Showing 701750 of 5548 papers

TitleStatusHype
DiffuSETS: 12-lead ECG Generation Conditioned on Clinical Text Reports and Patient-Specific InformationCode1
AllClear: A Comprehensive Dataset and Benchmark for Cloud Removal in Satellite ImageryCode1
Benchmarking Offline Reinforcement Learning on Real-Robot HardwareCode1
Benchmarking emergency department triage prediction models with machine learning and large public electronic health recordsCode1
Automatic sleep stage classification with deep residual networks in a mixed-cohort settingCode1
EgoPlan-Bench: Benchmarking Multimodal Large Language Models for Human-Level PlanningCode1
Benchmarking: Past, Present and FutureCode1
Benchmarking Omni-Vision Representation through the Lens of Visual RealmsCode1
DiagnosisArena: Benchmarking Diagnostic Reasoning for Large Language ModelsCode1
Autonomous Microscopy Experiments through Large Language Model AgentsCode1
A Comprehensive Study on Large-Scale Graph Training: Benchmarking and RethinkingCode1
DialogueLLM: Context and Emotion Knowledge-Tuned Large Language Models for Emotion Recognition in ConversationsCode1
Emoji Prediction: Extensions and BenchmarkingCode1
DIG In: Evaluating Disparities in Image Generations with Indicators for Geographic DiversityCode1
A Ladder of Causal DistancesCode1
ATOMMIC: An Advanced Toolbox for Multitask Medical Imaging Consistency to facilitate Artificial Intelligence applications from acquisition to analysis in Magnetic Resonance ImagingCode1
Atom-Level Optical Chemical Structure Recognition with Limited SupervisionCode1
End-to-end Emotion-Cause Pair Extraction via Learning to LinkCode1
DFGC 2022: The Second DeepFake Game CompetitionCode1
Digital Typhoon: Long-term Satellite Image Dataset for the Spatio-Temporal Modeling of Tropical CyclonesCode1
BabySLM: language-acquisition-friendly benchmark of self-supervised spoken language modelsCode1
Entering Real Social World! Benchmarking the Social Intelligence of Large Language Models from a First-person PerspectiveCode1
dMelodies: A Music Dataset for Disentanglement LearningCode1
Benchmarking Quantized Neural Networks on FPGAs with FINNCode1
Detecting beats in the photoplethysmogram: benchmarking open-source algorithmsCode1
Benchmarking Large Language Models on CMExam -- A Comprehensive Chinese Medical Exam DatasetCode1
DetectRL: Benchmarking LLM-Generated Text Detection in Real-World ScenariosCode1
Descending through a Crowded Valley - Benchmarking Deep Learning OptimizersCode1
Descending through a Crowded Valley — Benchmarking Deep Learning OptimizersCode1
EvalCrafter: Benchmarking and Evaluating Large Video Generation ModelsCode1
Developing a Scalable Benchmark for Assessing Large Language Models in Knowledge Graph EngineeringCode1
Benchmarking Relief-Based Feature Selection Methods for Bioinformatics Data MiningCode1
Evaluating Attribution for Graph Neural NetworksCode1
Benchmarking Large Language Models on Controllable Generation under Diversified InstructionsCode1
A Japanese Dataset for Subjective and Objective Sentiment Polarity Classification in Micro Blog DomainCode1
Benchmarking Large Multimodal Models against Common CorruptionsCode1
Geometric Deep Learning for Structure-Based Drug Design: A SurveyCode1
A Comprehensive Study of the Robustness for LiDAR-based 3D Object Detectors against Adversarial AttacksCode1
Benchmarking Large Language Models on Answering and Explaining Challenging Medical QuestionsCode1
Benchmarking Robustness of 3D Object Detection to Common CorruptionsCode1
DexArt: Benchmarking Generalizable Dexterous Manipulation with Articulated ObjectsCode1
EventEA: Benchmarking Entity Alignment for Event-centric Knowledge GraphsCode1
A Systematic Benchmarking Analysis of Transfer Learning for Medical Image AnalysisCode1
Benchmarking saliency methods for chest X-ray interpretationCode1
Benchmarking Robustness to Adversarial Image ObfuscationsCode1
Beacon, a lightweight deep reinforcement learning benchmark library for flow controlCode1
Experimental Validation of Ultrasound Beamforming with End-to-End Deep Learning for Single Plane Wave ImagingCode1
Explainable Benchmarking for Iterative Optimization HeuristicsCode1
Benchmarking Spectral Graph Neural Networks: A Comprehensive Study on Effectiveness and EfficiencyCode1
Benchmarking Large Language Models for News SummarizationCode1
Show:102550
← PrevPage 15 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified