SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 88518875 of 474278 papers

TitleStatusHype
LongLive: Real-time Interactive Long Video Generation0
LLM Reasoning for Machine Translation: Synthetic Data Generation over Thinking Tokens0
Direct Multi-Token Decoding0
Scaling Long-Horizon LLM Agent via Context-Folding0
PanoTPS-Net: Panoramic Room Layout Estimation via Thin Plate Spline TransformationCode0
Do LLMs "Feel"? Emotion Circuits Discovery and Control0
Diffusion-Link: Diffusion Probabilistic Model for Bridging the Audio-Text Modality GapCode0
Coupled Degradation Modeling and Fusion: A VLM-Guided Degradation-Coupled Network for Degradation-Aware Infrared and Visible Image FusionCode0
ReLook: Vision-Grounded RL with a Multimodal LLM Critic for Agentic Web Coding0
Deconstructing Attention: Investigating Design Principles for Effective Language Modeling0
ExpVid: A Benchmark for Experiment Video Understanding & Reasoning0
Demystifying Reinforcement Learning in Agentic ReasoningCode0
IVEBench: Modern Benchmark Suite for Instruction-Guided Video Editing Assessment0
InfiniHuman: Infinite 3D Human Creation with Precise Control0
Point Prompting: Counterfactual Tracking with Video Diffusion Models0
ACADREASON: Exploring the Limits of Reasoning Models with Academic Research Problems0
Diffusion Transformers with Representation Autoencoders0
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs0
SPADE: Spatial Transcriptomics and Pathology Alignment Using a Mixture of Data Experts for an Expressive Latent SpaceCode0
APLOT: Robust Reward Modeling via Adaptive Preference Learning with Optimal TransportCode0
Add-One-In: Incremental Sample Selection for Large Language Models via a Choice-Based Greedy ParadigmCode0
Go Beyond Earth: Understanding Human Actions and Scenes in Microgravity EnvironmentsCode0
Prompt4Trust: A Reinforcement Learning Prompt Augmentation Framework for Clinically-Aligned Confidence Calibration in Multimodal Large Language ModelsCode0
STAR: A Benchmark for Astronomical Star Fields Super-ResolutionCode0
FSA: An Alternative Efficient Implementation of Native Sparse Attention KernelCode0
Show:102550
← PrevPage 355 of 18972Next →