SOTAVerified

Benchmarking

Papers

Showing 12011225 of 5548 papers

TitleStatusHype
Benchmarking Large Language Models for Persian: A Preliminary Study Focusing on ChatGPTCode1
Benchmarking Skeleton-based Motion Encoder Models for Clinical Applications: Estimating Parkinson's Disease Severity in Walking SequencesCode1
Benchmarking Object Detectors under Real-World Distribution Shifts in Satellite ImageryCode1
Benchmarking Object Detectors with COCO: A New Path ForwardCode1
KO codes: Inventing Nonlinear Encoding and Decoding for Reliable Wireless Communication via Deep-learningCode1
KoLA: Carefully Benchmarking World Knowledge of Large Language ModelsCode1
Can Language Models Make Fun? A Case Study in Chinese Comical CrosstalkCode1
Can Language Models Employ the Socratic Method? Experiments with Code DebuggingCode1
Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMsCode1
LabelBench: A Comprehensive Framework for Benchmarking Adaptive Label-Efficient LearningCode1
Working Memory Capacity of ChatGPT: An Empirical StudyCode1
Benchmarking Large Language Models for Automated Verilog RTL Code GenerationCode1
Benchmarking Spatial Relationships in Text-to-Image GenerationCode1
CASTLE: Benchmarking Dataset for Static Code Analyzers and LLMs towards CWE DetectionCode1
A Reinforcement Learning Environment for Multi-Service UAV-enabled Wireless SystemsCode1
3DYoga90: A Hierarchical Video Dataset for Yoga Pose UnderstandingCode1
CBench: Towards Better Evaluation of Question Answering Over Knowledge GraphsCode1
CAVIAR: Co-simulation of 6G Communications, 3D Scenarios and AI for Digital TwinsCode1
Benchmarking Simulation-Based InferenceCode1
GuacaMol: Benchmarking Models for De Novo Molecular DesignCode1
Benchmarking Language Models for Code Syntax UnderstandingCode1
Large Language Models for Multi-Robot Systems: A SurveyCode1
TextEE: Benchmark, Reevaluation, Reflections, and Future Challenges in Event ExtractionCode1
Chaos as an interpretable benchmark for forecasting and data-driven modellingCode1
Graph Robustness Benchmark: Benchmarking the Adversarial Robustness of Graph Machine LearningCode1
Show:102550
← PrevPage 49 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified