SOTAVerified

Benchmarking

Papers

Showing 54265450 of 5548 papers

TitleStatusHype
Transformers for Green Semantic Communication: Less Energy, More SemanticsCode0
Benchmarking Data Efficiency in Δ-ML and Multifidelity Models for Quantum ChemistryCode0
ViP: Video Platform for PyTorchCode0
PsOCR: Benchmarking Large Multimodal Models for Optical Character Recognition in Low-resource Pashto LanguageCode0
Comparative Study Between Distance Measures On Supervised Optimum-Path Forest ClassificationCode0
Towards Efficient Synchronous Federated Training: A Survey on System Optimization StrategiesCode0
Which Model to Trust: Assessing the Influence of Models on the Performance of Reinforcement Learning Algorithms for Continuous Control TasksCode0
Benchmarking Automated Clinical Language Simplification: Dataset, Algorithm, and EvaluationCode0
Comparative Analysis: Violence Recognition from Videos using Transfer LearningCode0
Multi-Source Knowledge Pruning for Retrieval-Augmented Generation: A Benchmark and Empirical StudyCode0
Benchmarking community drug response prediction models: datasets, models, tools, and metrics for cross-dataset generalization analysisCode0
Compact Trilinear Interaction for Visual Question AnsweringCode0
Benchmarking Classic and Learned Navigation in Complex 3D EnvironmentsCode0
An Experimental Evaluation of Imputation Models for Spatial-Temporal Traffic DataCode0
Benchmarking Children's ASR with Supervised and Self-supervised Speech Foundation ModelsCode0
VisFactor: Benchmarking Fundamental Visual Cognition in Multimodal Large Language ModelsCode0
ColorGrid: A Multi-Agent Non-Stationary Environment for Goal Inference and AssistanceCode0
CODES: Benchmarking Coupled ODE SurrogatesCode0
CodeS: Towards Code Model Generalization Under Distribution ShiftCode0
Code Ownership in Open-Source AI Software SecurityCode0
Benchmarking ChatGPT on Algorithmic ReasoningCode0
COCO: Performance AssessmentCode0
Who Evaluates the Evaluations? Objectively Scoring Text-to-Image Prompt Coherence Metrics with T2IScoreScore (TS2)Code0
Benchmarking ChatGPT-4 on ACR Radiation Oncology In-Training (TXIT) Exam and Red Journal Gray Zone Cases: Potentials and Challenges for AI-Assisted Medical Education and Decision Making in Radiation OncologyCode0
Traveling Across Languages: Benchmarking Cross-Lingual Consistency in Multimodal LLMsCode0
Show:102550
← PrevPage 218 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified