SOTAVerified

Benchmarking

Papers

Showing 576600 of 5548 papers

TitleStatusHype
Benchmarking Mutual Information-based Loss Functions in Federated Learning0
Benchmarking Audio Deepfake Detection Robustness in Real-world Communication Scenarios0
Power Line Communication vs. Talkative Power Conversion: A Benchmarking Study0
Causality-enhanced Decision-Making for Autonomous Mobile Robots in Dynamic EnvironmentsCode0
Continual Learning Strategies for 3D Engineering Regression Problems: A Benchmarking StudyCode0
REAL: Benchmarking Autonomous Agents on Deterministic Simulations of Real WebsitesCode3
Benchmarking Biopharmaceuticals Retrieval-Augmented Generation Evaluation0
GaSLight: Gaussian Splats for Spatially-Varying Lighting in HDR0
HypoBench: Towards Systematic and Principled Benchmarking for Hypothesis GenerationCode2
Mamba-Based Ensemble learning for White Blood Cell ClassificationCode0
Benchmarking Next-Generation Reasoning-Focused Large Language Models in Ophthalmology: A Head-to-Head Evaluation on 5,888 Items0
CLASH: Evaluating Language Models on Judging High-Stakes Dilemmas from Multiple Perspectives0
E2E Parking Dataset: An Open Benchmark for End-to-End Autonomous Parking0
FHBench: Towards Efficient and Personalized Federated Learning for Multimodal HealthcareCode0
Benchmarking Vision Language Models on German Factual Data0
BEACON: A Benchmark for Efficient and Accurate Counting of Subgraphs0
BoTTA: Benchmarking on-device Test Time Adaptation0
Foundation Models for Remote Sensing: An Analysis of MLLMs for Object Localization0
COUNTS: Benchmarking Object Detectors and Multimodal Large Language Models under Distribution Shifts0
LMFormer: Lane based Motion Prediction Transformer0
Benchmarking 3D Human Pose Estimation Models Under Occlusions0
CameraBench: Benchmarking Visual Reasoning in MLLMs via Photography0
TinyverseGP: Towards a Modular Cross-domain Benchmarking Framework for Genetic ProgrammingCode1
Beyond Chains of Thought: Benchmarking Latent-Space Reasoning Abilities in Large Language Models0
Trade-offs in Privacy-Preserving Eye Tracking through Iris Obfuscation: A Benchmarking StudyCode0
Show:102550
← PrevPage 24 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified