SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 1470114750 of 474278 papers

TitleStatusHype
Learning from Trials and Errors: Reflective Test-Time Planning for Embodied LLMs1
Dropping Anchor and Spherical Harmonics for Sparse-view Gaussian Splatting1
Tool-R0: Self-Evolving LLM Agents for Tool-Learning from Zero Data1
GOT-Edit: Geometry-Aware Generic Object Tracking via Online Model Editing1
Untied Ulysses: Memory-Efficient Context Parallelism via Headwise Chunking1
Nacrith: Neural Lossless Compression via Ensemble Context Modeling and High-Precision CDF Coding1
MergeMix: A Unified Augmentation Paradigm for Visual and Multi-Modal Understanding1
MIST: Mutual Information Estimation Via Supervised Training1
MedCLIPSeg: Probabilistic Vision-Language Adaptation for Data-Efficient and Generalizable Medical Image Segmentation1
Anatomy of Agentic Memory: Taxonomy and Empirical Analysis of Evaluation and System Limitations1
SeaCache: Spectral-Evolution-Aware Cache for Accelerating Diffusion Models1
WildOS: Open-Vocabulary Object Search in the Wild1
RewardMap: Tackling Sparse Rewards in Fine-grained Visual Reasoning via Multi-Stage Reinforcement Learning1
Adam Improves Muon: Adaptive Moment Estimation with Orthogonalized Momentum1
Multimodal Prompt Optimization: Why Not Leverage Multiple Modalities for MLLMs1
Learning Personalized Agents from Human Feedback1
Reinforced Fast Weights with Next-Sequence Prediction1
Does Socialization Emerge in AI Agent Society? A Case Study of Moltbook1
Synthesizing High-Quality Visual Question Answering from Medical Documents with Generator-Verifier LMMs1
DeepVision-103K: A Visually Diverse, Broad-Coverage, and Verifiable Mathematical Dataset for Multimodal Reasoning1
m1: Unleash the Potential of Test-Time Scaling for Medical Reasoning with Large Language Models1
ReLoop: Structured Modeling and Behavioral Verification for Reliable LLM-Based Optimization1
Avey-B1
SR-Scientist: Scientific Equation Discovery With Agentic AI1
Deep Ignorance: Filtering Pretraining Data Builds Tamper-Resistant Safeguards into Open-Weight LLMs1
MARS: Modular Agent with Reflective Search for Automated AI Research1
EditCtrl: Disentangled Local and Global Control for Real-Time Generative Video Editing1
Revisiting the Platonic Representation Hypothesis: An Aristotelian View1
Stroke3D: Lifting 2D strokes into rigged 3D model via latent diffusion models1
A Trajectory-Based Safety Audit of Clawdbot (OpenClaw)1
Privileged Information Distillation for Language Models1
Image Generation with a Sphere Encoder1
InnoEval: On Research Idea Evaluation as a Knowledge-Grounded, Multi-Perspective Reasoning Problem1
Efficient Test-Time Scaling for Small Vision-Language Models1
Self-Improving World Modelling with Latent Actions1
Scaling Behavior of Discrete Diffusion Language Models1
BiasFreeBench: a Benchmark for Mitigating Bias in Large Language Model Responses1
AthenaBench: A Dynamic Benchmark for Evaluating LLMs in Cyber Threat Intelligence1
GISA: A Benchmark for General Information-Seeking Assistant1
SciAgentGym: Benchmarking Multi-Step Scientific Tool-use in LLM Agents1
SQuTR: A Robustness Benchmark for Spoken Query to Text Retrieval under Acoustic Noise1
Conversational Image Segmentation: Grounding Abstract Concepts with Scalable Supervision1
Kairos: Toward Adaptive and Parameter-Efficient Time Series Foundation Models1
Benchmarking Vision-Language Models for French PDF-to-Markdown Conversion1
The Pensieve Paradigm: Stateful Language Models Mastering Their Own Context1
P-GenRM: Personalized Generative Reward Model with Test-time User-based Scaling1
Stroke of Surprise: Progressive Semantic Illusions in Vector Sketching1
Can World Simulators Reason? Gen-ViRe: A Generative Visual Reasoning Benchmark1
Which Reasoning Trajectories Teach Students to Reason Better? A Simple Metric of Informative Alignment1
DeepSight: An All-in-One LM Safety Toolkit1
Show:102550
← PrevPage 295 of 9486Next →