SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 66016625 of 474278 papers

TitleStatusHype
Orchestration Framework for Financial Agents: From Algorithmic Trading to Agentic TradingCode0
UnicEdit-10M: A Dataset and Benchmark Breaking the Scale-Quality Barrier via Unified Verification for Reasoning-Enriched Edits0
PAI-Bench: A Comprehensive Benchmark For Physical AI0
See, Hear, and Understand: Benchmarking Audiovisual Human Speech Understanding in Multimodal Large Language Models0
TradeTrap: Are LLM-based Trading Agents Truly Reliable and Faithful?Code0
MAC-SLU: Multi-Intent Automotive Cabin Spoken Language Understanding BenchmarkCode0
The Art of Scaling Test-Time Compute for Large Language Models0
AirSim360: A Panoramic Simulation Platform within Drone View0
Learning Sim-to-Real Humanoid Locomotion in 15 Minutes0
Visual Sync: Multi-Camera Synchronization via Cross-View Object Motion0
SAM3-UNet: Simplified Adaptation of Segment Anything Model 3Code0
DenoiseGS: Gaussian Reconstruction Model for Burst DenoisingCode0
ViT^3: Unlocking Test-Time Training in VisionCode0
Low-Rank Prehab: Preparing Neural Networks for SVD CompressionCode0
Test-Time Spectrum-Aware Latent Steering for Zero-Shot Generalization in Vision-Language ModelsCode0
QGShap: Quantum Acceleration for Faithful GNN ExplanationsCode0
CLEF: Clinically-Guided Contrastive Learning for Electrocardiogram Foundation ModelsCode0
Spatiotemporal Pyramid Flow Matching for Climate EmulationCode0
WhAM: Towards A Translative Model of Sperm Whale VocalizationCode0
Capturing Context-Aware Route Choice Semantics for Trajectory Representation LearningCode0
TTSnap: Test-Time Scaling of Diffusion Models via Noise-Aware PruningCode0
PhyDetEx: Detecting and Explaining the Physical Plausibility of T2V ModelsCode0
BHRAM-IL: A Benchmark for Hallucination Recognition and Assessment in Multiple Indian LanguagesCode0
Disentangling Progress in Medical Image Registration: Beyond Trend-Driven Architectures towards Domain-Specific StrategiesCode0
TRivia: Self-supervised Fine-tuning of Vision-Language Models for Table RecognitionCode0
Show:102550
← PrevPage 265 of 18972Next →