SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 52515300 of 661570 papers

TitleStatusHype
Roboflow100-VL: A Multi-Domain Object Detection Benchmark for Vision-Language ModelsCode2
SANSA: Unleashing the Hidden Semantics in SAM2 for Few-Shot SegmentationCode2
Adversarial Attacks against Closed-Source MLLMs via Feature Optimal AlignmentCode2
TimePro: Efficient Multivariate Long-term Time Series Forecasting with Variable- and Time-Aware Hyper-stateCode2
R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Token RoutingCode2
Improved Representation Steering for Language ModelsCode2
SPA-RL: Reinforcing LLM Agents via Stepwise Progress AttributionCode2
Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?Code2
DoctorAgent-RL: A Multi-Agent Collaborative Reinforcement Learning System for Multi-Turn Clinical DialogueCode2
WINA: Weight Informed Neuron Activation for Accelerating Large Language Model InferenceCode2
One-shot Entropy MinimizationCode2
Omni-R1: Reinforcement Learning for Omnimodal Reasoning via Two-System CollaborationCode2
Memory-Efficient Visual Autoregressive Modeling with Scale-Aware KV Cache CompressionCode2
WeatherEdit: Controllable Weather Editing with 4D Gaussian FieldCode2
EmoSphere-SER: Enhancing Speech Emotion Recognition Through Spherical Representation with Auxiliary ClassificationCode2
Divide and Conquer: Grounding LLMs as Efficient Decision-Making Agents via Offline Hierarchical Reinforcement LearningCode2
AniCrafter: Customizing Realistic Human-Centric Animation via Avatar-Background Conditioning in Video Diffusion ModelsCode2
The UD-NewsCrawl Treebank: Reflections and Challenges from a Large-scale Tagalog Syntactic Annotation ProjectCode2
A Lightweight Hybrid Dual Channel Speech Enhancement System under Low-SNR ConditionsCode2
SAEs Are Good for Steering -- If You Select the Right FeaturesCode2
CSTrack: Enhancing RGB-X Tracking via Compact Spatiotemporal FeaturesCode2
Chain-of-Thought for Autonomous Driving: A Comprehensive Survey and Future ProspectsCode2
Training-Free Multi-Step Audio Source SeparationCode2
FlowSE: Efficient and High-Quality Speech Enhancement via Flow MatchingCode2
MASKSEARCH: A Universal Pre-Training Framework to Enhance Agentic Search CapabilityCode2
DiSA: Diffusion Step Annealing in Autoregressive Image GenerationCode2
Large Language Models Meet Knowledge Graphs for Question Answering: Synthesis and OpportunitiesCode2
MAS-Zero: Designing Multi-Agent Systems with Zero SupervisionCode2
SynLogic: Synthesizing Verifiable Reasoning Data at Scale for Learning Logical Reasoning and BeyondCode2
Accelerating Diffusion-based Text-to-Speech Model Training with Dual Modality AlignmentCode2
The Missing Point in Vision Transformers for Universal Image SegmentationCode2
MFA-KWS: Effective Keyword Spotting with Multi-head Frame-asynchronous DecodingCode2
Jodi: Unification of Visual Generation and Understanding via Joint ModelingCode2
MetaMind: Modeling Human Social Thoughts with Metacognitive Multi-Agent SystemsCode2
I2MoE: Interpretable Multimodal Interaction-aware Mixture-of-ExpertsCode2
Benchmarking Laparoscopic Surgical Image Restoration and BeyondCode2
VTool-R1: VLMs Learn to Think with Images via Reinforcement Learning on Multimodal Tool UseCode2
VPGS-SLAM: Voxel-based Progressive 3D Gaussian SLAM in Large-Scale ScenesCode2
Shifting AI Efficiency From Model-Centric to Data-Centric CompressionCode2
Improved Immiscible Diffusion: Accelerate Diffusion Training by Reducing Its MiscibilityCode2
LiteCUA: Computer as MCP Server for Computer-Use Agent on AIOSCode2
Using Large Language Models to Tackle Fundamental Challenges in Graph Learning: A Comprehensive SurveyCode2
CRMArena-Pro: Holistic Assessment of LLM Agents Across Diverse Business Scenarios and InteractionsCode2
Spiking Transformers Need High Frequency InformationCode2
Geometry Aware Operator Transformer as an Efficient and Accurate Neural Surrogate for PDEs on Arbitrary DomainsCode2
VeriThinker: Learning to Verify Makes Reasoning Model EfficientCode2
Managing FAIR Knowledge Graphs as Polyglot Data End Points: A Benchmark based on the rdf2pg Framework and Plant Biology DataCode2
MetaBox-v2: A Unified Benchmark Platform for Meta-Black-Box OptimizationCode2
ComfyMind: Toward General-Purpose Generation via Tree-Based Planning and Reactive FeedbackCode2
DanmakuTPPBench: A Multi-modal Benchmark for Temporal Point Process Modeling and UnderstandingCode2
Show:102550
← PrevPage 106 of 13232Next →