SOTAVerified

Benchmarking

Papers

Showing 801850 of 5548 papers

TitleStatusHype
A SWAT-based Reinforcement Learning Framework for Crop ManagementCode1
Benchmarking Multi-modal Semantic Segmentation under Sensor Failures: Missing and Noisy Modality RobustnessCode1
Benchmarks for Deep Off-Policy EvaluationCode1
Bongard-HOI: Benchmarking Few-Shot Visual Reasoning for Human-Object InteractionsCode1
Recent Advances on Neural Network Pruning at InitializationCode1
Boosting Neural Image Compression for Machines Using Latent Space MaskingCode1
Enhancing Biomedical Relation Extraction with DirectionalityCode1
Benchmarking Algorithms for Federated Domain GeneralizationCode1
Benchmarking Algorithms for Submodular Optimization Problems Using IOHProfilerCode1
BRIDGE: Benchmarking Large Language Models for Understanding Real-world Clinical Practice TextCode1
Evaluating Adversarial Attacks on ImageNet: A Reality Check on Misclassification ClassesCode1
Federated Learning Under Intermittent Client Availability and Time-Varying Communication ConstraintsCode1
Benchmarking and Analysis of Unsupervised Object Segmentation from Real-world Single ImagesCode1
Benchmarking and Analyzing 3D-aware Image Synthesis with a Modularized CodebaseCode1
Benchmarking and Analyzing 3D Human Pose and Shape Estimation Beyond AlgorithmsCode1
A Benchmarking Study of Kolmogorov-Arnold Networks on Tabular DataCode1
Benchmarking and scaling of deep learning models for land cover image classificationCode1
Benchmarking and Analyzing Point Cloud Classification under CorruptionsCode1
Benchmarking and Analyzing Robust Point Cloud Recognition: Bag of Tricks for Defending Adversarial ExamplesCode1
4D Panoptic LiDAR SegmentationCode1
Efficient Prediction of Peptide Self-assembly through Sequential and Graphical EncodingCode1
Ego-Body Pose Estimation via Ego-Head Pose EstimationCode1
Benchmarking Micro-action Recognition: Dataset, Methods, and ApplicationsCode1
Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language ModelsCode1
A Closer Look at Mortality Risk Prediction from ElectrocardiogramsCode1
EduBench: A Comprehensive Benchmarking Dataset for Evaluating Large Language Models in Diverse Educational ScenariosCode1
CAB: Comprehensive Attention Benchmarking on Long Sequence ModelingCode1
ByzFL: Research Framework for Robust Federated LearningCode1
Benchmarking of DL Libraries and Models on Mobile DevicesCode1
Benchmarking and Explaining Large Language Model-based Code Generation: A Causality-Centric ApproachCode1
Benchmarking Meta-embeddings: What Works and What Does NotCode1
EgoNormia: Benchmarking Physical Social Norm UnderstandingCode1
A Survey on Graph Counterfactual Explanations: Definitions, Methods, Evaluation, and Research ChallengesCode1
COSMOS: Catching Out-of-Context Misinformation with Self-Supervised LearningCode1
AIPerf: Automated machine learning as an AI-HPC benchmarkCode1
Can Language Models Make Fun? A Case Study in Chinese Comical CrosstalkCode1
Benchmarking machine learning models on multi-centre eICU critical care datasetCode1
Can language agents be alternatives to PPO? A Preliminary Empirical Study On OpenAI GymCode1
Benchmarking Low-Shot Robustness to Natural Distribution ShiftsCode1
CASTLE: Benchmarking Dataset for Static Code Analyzers and LLMs towards CWE DetectionCode1
Improving and Benchmarking Offline Reinforcement Learning AlgorithmsCode1
IMUPoser: Full-Body Pose Estimation using IMUs in Phones, Watches, and EarbudsCode1
4DBInfer: A 4D Benchmarking Toolbox for Graph-Centric Predictive Modeling on Relational DBsCode1
Benchmarking and Survey of Explanation Methods for Black Box ModelsCode1
An Empirical Study into Clustering of Unseen Datasets with Self-Supervised EncodersCode1
ECRECer: Enzyme Commission Number Recommendation and Benchmarking based on Multiagent Dual-core LearningCode1
Benchmarking Local Robustness of High-Accuracy Binary Neural Networks for Enhanced Traffic Sign RecognitionCode1
AI in Lung Health: Benchmarking Detection and Diagnostic Models Across Multiple CT Scan DatasetsCode1
CattleFace-RGBT: RGB-T Cattle Facial Landmark BenchmarkCode1
Benchmarking Meaning Representations in Neural Semantic ParsingCode1
Show:102550
← PrevPage 17 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified