SOTAVerified

Benchmarking

Papers

Showing 46014650 of 5548 papers

TitleStatusHype
The eBible Corpus: Data and Model Benchmarks for Bible Translation for Low-Resource LanguagesCode0
LoopDB: A Loop Closure Dataset for Large Scale Simultaneous Localization and MappingCode0
Bilingual BSARD: Extending Statutory Article Retrieval to DutchCode0
Hyperparameter-Free Losses for Model-Based Monocular ReconstructionCode0
Lost in Benchmarks? Rethinking Large Language Model Benchmarking with Item Response TheoryCode0
Hyperopt-Sklearn: Automatic Hyperparameter Configuration for Scikit-LearnCode0
Hyperbolic Benchmarking Unveils Network Topology-Feature Relationship in GNN PerformanceCode0
Bias Reduction via Cooperative Bargaining in Synthetic Graph Dataset GenerationCode0
Low Complexity Hybrid Beamforming for mmWave Full-Duplex Integrated Access and BackhaulCode0
Bias Analysis and Mitigation in the Evaluation of Authorship VerificationCode0
Beyond Supervised vs. Unsupervised: Representative Benchmarking and Analysis of Image Representation LearningCode0
Balancing policy constraint and ensemble size in uncertainty-based offline reinforcement learningCode0
AnaloBench: Benchmarking the Identification of Abstract and Long-context AnalogiesCode0
Hybrid Random FeaturesCode0
Beyond Slow Signs in High-fidelity Model ExtractionCode0
Hybrid Machine Learning Models of Classifying Residential Requests for Smart DispatchingCode0
BaDLAD: A Large Multi-Domain Bengali Document Layout Analysis DatasetCode0
HuSc3D: Human Sculpture dataset for 3D object reconstructionCode0
LVLM-Compress-Bench: Benchmarking the Broader Impact of Large Vision-Language Model CompressionCode0
HSSBench: Benchmarking Humanities and Social Sciences Ability for Multimodal Large Language ModelsCode0
Beyond Optimism: Exploration With Partially Observable RewardsCode0
M3Dsynth: A dataset of medical 3D images with AI-generated local manipulationsCode0
M4Fog: A Global Multi-Regional, Multi-Modal, and Multi-Stage Dataset for Marine Fog Detection and Forecasting to Bridge Ocean and AtmosphereCode0
The Elusive Pursuit of Reproducing PATE-GAN: Benchmarking, Auditing, DebuggingCode0
Back to Basics: Benchmarking Canonical Evolution Strategies for Playing AtariCode0
Machine-assisted quantitizing designs: augmenting humanities and social sciences with artificial intelligenceCode0
Beyond Marginal Uncertainty: How Accurately can Bayesian Regression Models Estimate Posterior Predictive Correlations?Code0
Machine learning classification of non-Markovian noise disturbing quantum dynamicsCode0
Machine Learning Automation Toolbox (MLaut)Code0
3D fluorescence microscopy data synthesis for segmentation and benchmarkingCode0
Machine Learning Cryptanalysis of a Quantum Random Number GeneratorCode0
Visual-RAG: Benchmarking Text-to-Image Retrieval Augmented Generation for Visual Knowledge Intensive QueriesCode0
Visual-Inertial SLAM for Unstructured Outdoor Environments: Benchmarking the Benefits and Computational Costs of Loop ClosingCode0
Machine-learning for photoplethysmography analysis: Benchmarking feature, image, and signal-based approachesCode0
Beyond Document Page Classification: Design, Datasets, and ChallengesCode0
HR-VILAGE-3K3M: A Human Respiratory Viral Immunization Longitudinal Gene Expression Dataset for Systems ImmunityCode0
VizNet: Towards A Large-Scale Visualization Learning and Benchmarking RepositoryCode0
HRNET: AI on Edge for mask detection and social distancingCode0
HRIBench: Benchmarking Vision-Language Models for Real-Time Human Perception in Human-Robot InteractionCode0
How to Manage Tiny Machine Learning at Scale: An Industrial PerspectiveCode0
AMQA: An Adversarial Dataset for Benchmarking Bias of LLMs in Medicine and HealthcareCode0
Towards Segment Anything Model (SAM) for Medical Image Segmentation: A SurveyCode0
How Far Are We from Optimal Reasoning Efficiency?Code0
Magnetic Resonance Imaging Feature-Based Subtyping and Model Ensemble for Enhanced Brain Tumor SegmentationCode0
Mahalanobis k-NN: A Statistical Lens for Robust Point-Cloud RegistrationsCode0
Beyond Atomic Geometry Representations in Materials Science: A Human-in-the-Loop Multimodal FrameworkCode0
Beyond Accuracy: A Consolidated Tool for Visual Question Answering BenchmarkingCode0
Malliavin-Mancino estimators implemented with non-uniform fast Fourier transformsCode0
HopaDIFF: Holistic-Partial Aware Fourier Conditioned Diffusion for Referring Human Action Segmentation in Multi-Person ScenariosCode0
HOEG: A New Approach for Object-Centric Predictive Process MonitoringCode0
Show:102550
← PrevPage 93 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified