SOTAVerified

Benchmarking

Papers

Showing 12511300 of 5548 papers

TitleStatusHype
Benchmarking Transcriptomics Foundation Models for Perturbation Analysis : one PCA still rules them allCode1
ImageNet-E: Benchmarking Neural Network Robustness via Attribute EditingCode1
Implicit Multi-Spectral Transformer: An Lightweight and Effective Visible to Infrared Image Translation ModelCode1
Benchmarking emergency department triage prediction models with machine learning and large public electronic health recordsCode1
SoK: Membership Inference Attacks on LLMs are Rushing Nowhere (and How to Fix It)Code1
CompanyKG: A Large-Scale Heterogeneous Graph for Company Similarity QuantificationCode1
Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms in Cooperative TasksCode1
Benchmarking Language Models for Code Syntax UnderstandingCode1
TextEE: Benchmark, Reevaluation, Reflections, and Future Challenges in Event ExtractionCode1
Illuminating Darkness: Enhancing Real-world Low-light Scenes with Smartphone ImagesCode1
A Survey on Graph Counterfactual Explanations: Definitions, Methods, Evaluation, and Research ChallengesCode1
MEGA: Multilingual Evaluation of Generative AICode1
Benchmarking Language Model Creativity: A Case Study on Code GenerationCode1
Benchmarking the Spectrum of Agent CapabilitiesCode1
Benchmarking of DL Libraries and Models on Mobile DevicesCode1
MetaFormer and CNN Hybrid Model for Polyp Image SegmentationCode1
Meta-Surrogate Benchmarking for Hyperparameter OptimizationCode1
Benchmarking Quantized Neural Networks on FPGAs with FINNCode1
Image Colorization: A Survey and DatasetCode1
MGTBench: Benchmarking Machine-Generated Text DetectionCode1
IDToolkit: A Toolkit for Benchmarking and Developing Inverse Design Algorithms in NanophotonicsCode1
Benchmarking the Robustness of Spatial-Temporal Models Against CorruptionsCode1
Benchmarking Knowledge Boundary for Large Language Models: A Different Perspective on Model EvaluationCode1
MIMII DG: Sound Dataset for Malfunctioning Industrial Machine Investigation and Inspection for Domain Generalization TaskCode1
Benchmarking the Robustness of Temporal Action Detection Models Against Temporal CorruptionsCode1
Contemporary Symbolic Regression Methods and their Relative PerformanceCode1
Benchmarking Recommendation, Classification, and Tracing Based on Hugging Face Knowledge GraphCode1
minicons: Enabling Flexible Behavioral and Representational Analyses of Transformer Language ModelsCode1
ILIAS: Instance-Level Image retrieval At ScaleCode1
Image Matching across Wide Baselines: From Paper to PracticeCode1
Benchmarking Relief-Based Feature Selection Methods for Bioinformatics Data MiningCode1
Benchmarking the Robustness of Deep Neural Networks to Common Corruptions in Digital PathologyCode1
Benchmarking the Performance of Bayesian Optimization across Multiple Experimental Materials Science DomainsCode1
iAMPCN: a deep-learning approach for identifying antimicrobial peptides and their functional activitiesCode1
AirSim Drone Racing LabCode1
A framework for benchmarking clustering algorithmsCode1
ICU-Sepsis: A Benchmark MDP Built from Real Medical DataCode1
A Comprehensive Overview of Large Language ModelsCode1
CovDocker: Benchmarking Covalent Drug Design with Tasks, Datasets, and SolutionsCode1
Benchmarking Retrieval-Augmented Multimomal Generation for Document Question AnsweringCode1
Benchmarking the Generation of Fact Checking ExplanationsCode1
Arctique: An artificial histopathological dataset unifying realism and controllability for uncertainty quantificationCode1
A Systematic Benchmarking Analysis of Transfer Learning for Medical Image AnalysisCode1
Benchmarking Vision, Language, & Action Models on Robotic Learning TasksCode1
Benchmarking the Robustness of LiDAR-Camera Fusion for 3D Object DetectionCode1
A framework for benchmarking class-out-of-distribution detection and its application to ImageNetCode1
Benchmarking TinyML Systems: Challenges and DirectionCode1
Geometric Deep Learning for Structure-Based Drug Design: A SurveyCode1
A Japanese Dataset for Subjective and Objective Sentiment Polarity Classification in Micro Blog DomainCode1
iDNA-ABF: multi-scale deep biological language learning model for the interpretable prediction of DNA methylationsCode1
Show:102550
← PrevPage 26 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified