SOTAVerified

Benchmarking

Papers

Showing 726750 of 5548 papers

TitleStatusHype
Benchmarking Large Language Models on CMExam -- A Comprehensive Chinese Medical Exam DatasetCode1
DetectRL: Benchmarking LLM-Generated Text Detection in Real-World ScenariosCode1
Descending through a Crowded Valley - Benchmarking Deep Learning OptimizersCode1
Descending through a Crowded Valley — Benchmarking Deep Learning OptimizersCode1
EvalCrafter: Benchmarking and Evaluating Large Video Generation ModelsCode1
Developing a Scalable Benchmark for Assessing Large Language Models in Knowledge Graph EngineeringCode1
Benchmarking Relief-Based Feature Selection Methods for Bioinformatics Data MiningCode1
Evaluating Attribution for Graph Neural NetworksCode1
Benchmarking Large Language Models on Controllable Generation under Diversified InstructionsCode1
A Japanese Dataset for Subjective and Objective Sentiment Polarity Classification in Micro Blog DomainCode1
Benchmarking Large Multimodal Models against Common CorruptionsCode1
Geometric Deep Learning for Structure-Based Drug Design: A SurveyCode1
A Comprehensive Study of the Robustness for LiDAR-based 3D Object Detectors against Adversarial AttacksCode1
Benchmarking Large Language Models on Answering and Explaining Challenging Medical QuestionsCode1
Benchmarking Robustness of 3D Object Detection to Common CorruptionsCode1
DexArt: Benchmarking Generalizable Dexterous Manipulation with Articulated ObjectsCode1
EventEA: Benchmarking Entity Alignment for Event-centric Knowledge GraphsCode1
A Systematic Benchmarking Analysis of Transfer Learning for Medical Image AnalysisCode1
Benchmarking saliency methods for chest X-ray interpretationCode1
Benchmarking Robustness to Adversarial Image ObfuscationsCode1
Beacon, a lightweight deep reinforcement learning benchmark library for flow controlCode1
Experimental Validation of Ultrasound Beamforming with End-to-End Deep Learning for Single Plane Wave ImagingCode1
Explainable Benchmarking for Iterative Optimization HeuristicsCode1
Benchmarking Spectral Graph Neural Networks: A Comprehensive Study on Effectiveness and EfficiencyCode1
Benchmarking Large Language Models for News SummarizationCode1
Show:102550
← PrevPage 30 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified