SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 94019425 of 474278 papers

TitleStatusHype
VLA-RFT: Vision-Language-Action Reinforcement Fine-tuning with Verified Rewards in World Simulators0
LongCodeZip: Compress Long Context for Code Language Models0
Efficient Multi-modal Large Language Models via Progressive Consistency Distillation0
Can World Models Benefit VLMs for World Dynamics?0
QUASAR: Quantum Assembly Code Generation Using Tool-Augmented LLMs via Agentic RL0
CurES: From Gradient Analysis to Efficient Curriculum Learning for Reasoning LLMs0
Pay-Per-Search Models are Abstention Models0
TOUCAN: Synthesizing 1.5M Tool-Agentic Data from Real-World MCP Environments0
WAInjectBench: Benchmarking Prompt Injection Detections for Web AgentsCode0
Gather-Scatter Mamba: Accelerating Propagation with Efficient State Space ModelCode0
Collaborative-Distilled Diffusion Models (CDDM) for Accelerated and Lightweight Trajectory PredictionCode0
ReSWD: ReSTIR'd, not shaken. Combining Reservoir Sampling and Sliced Wasserstein Distance for Variance Reduction0
MathSticks: A Benchmark for Visual Symbolic Compositional Reasoning with Matchstick PuzzlesCode0
DACoN: DINO for Anime Paint Bucket Colorization with Any Number of Reference ImagesCode0
GIM: Improved Interpretability for Large Language ModelsCode0
Steering When Necessary: Flexible Steering Large Language Models with BacktrackingCode0
CogVLA: Cognition-Aligned Vision-Language-Action Model via Instruction-Driven Routing & SparsificationCode0
Efficient Probabilistic Tensor NetworksCode0
Domain-Specialized Interactive Segmentation Framework for Meningioma Radiotherapy PlanningCode0
Enhancing Rating Prediction with Off-the-Shelf LLMs Using In-Context User ReviewsCode0
Relative-Absolute Fusion: Rethinking Feature Extraction in Image-Based Iterative Method Selection for Solving Sparse Linear SystemsCode0
Beyond Log Likelihood: Probability-Based Objectives for Supervised Fine-Tuning across the Model Capability ContinuumCode0
InfVSR: Breaking Length Limits of Generic Video Super-ResolutionCode0
JEPA-T: Joint-Embedding Predictive Architecture with Text Fusion for Image GenerationCode0
Multi-Actor Multi-Critic Deep Deterministic Reinforcement Learning with a Novel Q-Ensemble MethodCode0
Show:102550
← PrevPage 377 of 18972Next →