SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 1475114800 of 474278 papers

TitleStatusHype
DeepSight: An All-in-One LM Safety Toolkit1
Learning to Configure Agentic AI Systems1
TADA! Tuning Audio Diffusion Models through Activation Steering1
GameDevBench: Evaluating Agentic Capabilities Through Game Development1
FeatureBench: Benchmarking Agentic Coding for Complex Feature Development1
Corruption-Aware Training of Latent Video Diffusion Models for Robust Text-to-Video Generation1
ContextBench: A Benchmark for Context Retrieval in Coding Agents1
P1-VL: Bridging Visual Perception and Scientific Reasoning in Physics Olympiads1
Parallel-Probe: Towards Efficient Parallel Thinking via 2D Probing1
Free(): Learning to Forget in Malloc-Only Reasoning Models1
EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning1
When the Prompt Becomes Visual: Vision-Centric Jailbreak Attacks for Large Image Editing Models1
ReForm: Reflective Autoformalization with Prospective Bounded Sequence Optimization1
Understanding Language Prior of LVLMs by Contrasting Chain-of-Embedding1
MILR: Improving Multimodal Image Generation via Test-Time Latent Reasoning1
Prism: Spectral-Aware Block-Sparse Attention1
Fast KVzip: Efficient and Accurate LLM Inference with Gated KV Eviction1
Contact-Anchored Policies: Contact Conditioning Creates Strong Robot Utility Models1
Data Science and Technology Towards AGI Part I: Tiered Data Management1
CoMAS: Co-Evolving Multi-Agent Systems via Interaction Rewards1
G-LNS: Generative Large Neighborhood Search for LLM-Based Automatic Heuristic Design1
Demo-ICL: In-Context Learning for Procedural Video Knowledge Acquisition1
Luth: Efficient French Specialization for Small Language Models and Cross-Lingual Transfer1
Rethinking Global Text Conditioning in Diffusion Transformers1
Autoregressive Image Generation with Masked Bit Modeling1
When and How Much to Imagine: Adaptive Test-Time Scaling with World Models for Visual Spatial Reasoning1
Learning Self-Correction in Vision-Language Models via Rollout Augmentation1
How2Everything: Mining the Web for How-To Procedures to Evaluate and Improve LLMs1
Simultaneous Tactile-Visual Perception for Learning Multimodal Robot Manipulation1
InftyThink+: Effective and Efficient Infinite-Horizon Reasoning via Reinforcement Learning1
The Condition Number as a Scale-Invariant Proxy for Information Encoding in Neural Units1
Data Darwinism Part I: Unlocking the Value of Scientific Data for Pre-training1
Learning While Staying Curious: Entropy-Preserving Supervised Fine-Tuning via Adaptive Self-Distillation for Large Reasoning Models1
TodoEvolve: Learning to Architect Agent Planning Systems1
Safety Alignment of LMs via Non-cooperative Games1
PlanViz: Evaluating Planning-Oriented Image Generation and Editing for Computer-Use Tasks1
Spider-Sense: Intrinsic Risk Sensing for Efficient Agent Defense with Hierarchical Adaptive Screening1
T-REGS: Minimum Spanning Tree Regularization for Self-Supervised Learning1
POINTS-GUI-G: GUI-Grounding Journey1
Mamba-FCS: Joint Spatio- Frequency Feature Fusion, Change-Guided Attention, and SeK Loss for Enhanced Semantic Change Detection in Remote Sensing1
Learning Query-Aware Budget-Tier Routing for Runtime Agent Memory1
SwimBird: Eliciting Switchable Reasoning Mode in Hybrid Autoregressive MLLMs1
Vector Quantization using Gaussian Variational Autoencoder1
RISE-Video: Can Video Generators Decode Implicit World Rules?1
See Less, See Right: Bi-directional Perceptual Shaping For Multimodal Reasoning1
EgoAVU: Egocentric Audio-Visual Understanding1
Horizon-LM: A RAM-Centric Architecture for LLM Training1
EvasionBench: A Large-Scale Benchmark for Detecting Managerial Evasion in Earnings Call Q&A1
OCRVerse: Towards Holistic OCR in End-to-End Vision-Language Models1
SynthVerse: A Large-Scale Diverse Synthetic Dataset for Point Tracking1
Show:102550
← PrevPage 296 of 9486Next →