SOTAVerified

Benchmarking

Papers

Showing 46014625 of 5548 papers

TitleStatusHype
The eBible Corpus: Data and Model Benchmarks for Bible Translation for Low-Resource LanguagesCode0
LoopDB: A Loop Closure Dataset for Large Scale Simultaneous Localization and MappingCode0
Bilingual BSARD: Extending Statutory Article Retrieval to DutchCode0
Hyperparameter-Free Losses for Model-Based Monocular ReconstructionCode0
Lost in Benchmarks? Rethinking Large Language Model Benchmarking with Item Response TheoryCode0
Hyperopt-Sklearn: Automatic Hyperparameter Configuration for Scikit-LearnCode0
Hyperbolic Benchmarking Unveils Network Topology-Feature Relationship in GNN PerformanceCode0
Bias Reduction via Cooperative Bargaining in Synthetic Graph Dataset GenerationCode0
Low Complexity Hybrid Beamforming for mmWave Full-Duplex Integrated Access and BackhaulCode0
Bias Analysis and Mitigation in the Evaluation of Authorship VerificationCode0
Beyond Supervised vs. Unsupervised: Representative Benchmarking and Analysis of Image Representation LearningCode0
Balancing policy constraint and ensemble size in uncertainty-based offline reinforcement learningCode0
AnaloBench: Benchmarking the Identification of Abstract and Long-context AnalogiesCode0
Hybrid Random FeaturesCode0
Beyond Slow Signs in High-fidelity Model ExtractionCode0
Hybrid Machine Learning Models of Classifying Residential Requests for Smart DispatchingCode0
BaDLAD: A Large Multi-Domain Bengali Document Layout Analysis DatasetCode0
HuSc3D: Human Sculpture dataset for 3D object reconstructionCode0
LVLM-Compress-Bench: Benchmarking the Broader Impact of Large Vision-Language Model CompressionCode0
HSSBench: Benchmarking Humanities and Social Sciences Ability for Multimodal Large Language ModelsCode0
Beyond Optimism: Exploration With Partially Observable RewardsCode0
M3Dsynth: A dataset of medical 3D images with AI-generated local manipulationsCode0
M4Fog: A Global Multi-Regional, Multi-Modal, and Multi-Stage Dataset for Marine Fog Detection and Forecasting to Bridge Ocean and AtmosphereCode0
The Elusive Pursuit of Reproducing PATE-GAN: Benchmarking, Auditing, DebuggingCode0
Back to Basics: Benchmarking Canonical Evolution Strategies for Playing AtariCode0
Show:102550
← PrevPage 185 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified