SOTAVerified

Benchmarking

Papers

Showing 826850 of 5548 papers

TitleStatusHype
EduBench: A Comprehensive Benchmarking Dataset for Evaluating Large Language Models in Diverse Educational ScenariosCode1
CAB: Comprehensive Attention Benchmarking on Long Sequence ModelingCode1
ByzFL: Research Framework for Robust Federated LearningCode1
Benchmarking of DL Libraries and Models on Mobile DevicesCode1
Benchmarking and Explaining Large Language Model-based Code Generation: A Causality-Centric ApproachCode1
Benchmarking Meta-embeddings: What Works and What Does NotCode1
EgoNormia: Benchmarking Physical Social Norm UnderstandingCode1
A Survey on Graph Counterfactual Explanations: Definitions, Methods, Evaluation, and Research ChallengesCode1
COSMOS: Catching Out-of-Context Misinformation with Self-Supervised LearningCode1
AIPerf: Automated machine learning as an AI-HPC benchmarkCode1
Can Language Models Make Fun? A Case Study in Chinese Comical CrosstalkCode1
Benchmarking machine learning models on multi-centre eICU critical care datasetCode1
Can language agents be alternatives to PPO? A Preliminary Empirical Study On OpenAI GymCode1
Benchmarking Low-Shot Robustness to Natural Distribution ShiftsCode1
CASTLE: Benchmarking Dataset for Static Code Analyzers and LLMs towards CWE DetectionCode1
Improving and Benchmarking Offline Reinforcement Learning AlgorithmsCode1
IMUPoser: Full-Body Pose Estimation using IMUs in Phones, Watches, and EarbudsCode1
4DBInfer: A 4D Benchmarking Toolbox for Graph-Centric Predictive Modeling on Relational DBsCode1
Benchmarking and Survey of Explanation Methods for Black Box ModelsCode1
An Empirical Study into Clustering of Unseen Datasets with Self-Supervised EncodersCode1
ECRECer: Enzyme Commission Number Recommendation and Benchmarking based on Multiagent Dual-core LearningCode1
Benchmarking Local Robustness of High-Accuracy Binary Neural Networks for Enhanced Traffic Sign RecognitionCode1
AI in Lung Health: Benchmarking Detection and Diagnostic Models Across Multiple CT Scan DatasetsCode1
CattleFace-RGBT: RGB-T Cattle Facial Landmark BenchmarkCode1
Benchmarking Meaning Representations in Neural Semantic ParsingCode1
Show:102550
← PrevPage 34 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified