SOTAVerified

Benchmarking

Papers

Showing 376400 of 5548 papers

TitleStatusHype
ADATIME: A Benchmarking Suite for Domain Adaptation on Time Series DataCode2
Benchmarking Robustness of 3D Point Cloud Recognition Against Common CorruptionsCode2
AiTLAS: Artificial Intelligence Toolbox for Earth ObservationCode2
Investigating Tradeoffs in Real-World Video Super-ResolutionCode2
Multitask Prompted Training Enables Zero-Shot Task GeneralizationCode2
MetaDrive: Composing Diverse Driving Scenarios for Generalizable Reinforcement LearningCode2
Panoptic nuScenes: A Large-Scale Benchmark for LiDAR Panoptic Segmentation and TrackingCode2
BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval ModelsCode2
Learning to Fly -- a Gym Environment with PyBullet Physics for Reinforcement Learning of Multi-agent Quadcopter ControlCode2
Learning Transferable Visual Models From Natural Language SupervisionCode2
Evaluating Large-Vocabulary Object Detectors: The Devil is in the DetailsCode2
PyHealth: A Python Library for Health Predictive ModelsCode2
TadGAN: Time Series Anomaly Detection Using Generative Adversarial NetworksCode2
Searching for a Search Method: Benchmarking Search Algorithms for Generating NLP Adversarial ExamplesCode2
Bringing Light Into the Dark: A Large-scale Evaluation of Knowledge Graph Embedding Models Under a Unified FrameworkCode2
Benchmarking Graph Neural NetworksCode2
Benchmarking Zero-shot Text Classification: Datasets, Evaluation and Entailment ApproachCode2
Habitat: A Platform for Embodied AI ResearchCode2
Benchmarking Neural Network Robustness to Common Corruptions and PerturbationsCode2
A large annotated medical image dataset for the development and evaluation of segmentation algorithmsCode2
Benchmarking Deep Reinforcement Learning for Continuous ControlCode2
LLMThinkBench: Towards Basic Math Reasoning and Overthinking in Large Language ModelsCode1
Latent Thermodynamic Flows: Unified Representation Learning and Generative Modeling of Temperature-Dependent Behaviors from Limited DataCode1
CovDocker: Benchmarking Covalent Drug Design with Tasks, Datasets, and SolutionsCode1
WattsOnAI: Measuring, Analyzing, and Visualizing Energy and Carbon Footprint of AI WorkloadsCode1
Show:102550
← PrevPage 16 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified