SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 76017650 of 661570 papers

TitleStatusHype
VisiFold: Long-Term Traffic Forecasting via Temporal Folding Graph and Node VisibilityCode0
Think While Watching: Online Streaming Segment-Level Memory for Multi-Turn Video Reasoning in Multimodal Large Language ModelsCode0
AS-Bridge: A Bidirectional Generative Framework Bridging Next-Generation Astronomical SurveysCode0
Few-for-Many Personalized Federated LearningCode0
SceneAssistant: A Visual Feedback Agent for Open-Vocabulary 3D Scene GenerationCode0
Less Data, Faster Convergence: Goal-Driven Data Optimization for Multimodal Instruction TuningCode0
Personalized Feature Translation for Expression Recognition: An Efficient Source-Free Domain Adaptation MethodCode0
Controllable Exploration in Hybrid-Policy RLVR for Multi-Modal ReasoningCode0
Spatial-TTT: Streaming Visual-based Spatial Intelligence with Test-Time Training2
Sparking Scientific Creativity via LLM-Driven Interdisciplinary Inspiration1
WideSeek-R1: Exploring Width Scaling for Broad Information Seeking via Multi-Agent Reinforcement Learning7
OmniStream: Mastering Perception, Reconstruction and Action in Continuous Streams2
Trust Your Critic: Robust Reward Modeling and Reinforcement Learning for Faithful Image Editing and Generation1
GRADE: Benchmarking Discipline-Informed Reasoning in Image Editing1
Strategic Navigation or Stochastic Search? How Agents and Humans Reason Over Document Collections1
Toward Complex-Valued Neural Networks for Waveform Generation1
FlashMotion: Few-Step Controllable Video Generation with Trajectory Guidance1
VolSplat: Rethinking Feed-Forward 3D Gaussian Splatting with Voxel-Aligned Prediction3
Mobile-GS: Real-time Gaussian Splatting for Mobile Devices2
Let It Flow: Agentic Crafting on Rock and Roll, Building the ROME Model within an Open Agentic Learning Ecosystem7
One Model, Many Budgets: Elastic Latent Interfaces for Diffusion Transformers1
Multi-Crit: Benchmarking Multimodal Judges on Pluralistic Criteria-Following1
MedEyes: Learning Dynamic Visual Focus for Medical Progressive DiagnosisCode0
RADAR: Closed-Loop Robotic Data Generation via Semantic Planning and Autonomous Causal Environment Reset0
HCP-DCNet: A Hierarchical Causal Primitive Dynamic Composition Network for Self-Improving Causal Understanding0
Budget-Sensitive Discovery Scoring: A Formally Verified Framework for Evaluating AI-Guided Scientific Selection0
Taming OpenClaw: Security Analysis and Mitigation of Autonomous LLM Agent Threats0
ProtoDCS: Towards Robust and Efficient Open-Set Test-Time Adaptation for Vision-Language ModelsCode0
EgoIntent: An Egocentric Step-level Benchmark for Understanding What, Why, and Next0
SPEGC: Continual Test-Time Adaptation via Semantic-Prompt-Enhanced Graph Clustering for Medical Image SegmentationCode0
Agentic Explainable Artificial Intelligence (Agentic XAI) Approach To Explore Better Explanation0
AdaFuse: Accelerating Dynamic Adapter Inference via Token-Level Pre-Gating and Fused Kernel Optimization0
SimulCost: A Cost-Aware Benchmark and Toolkit for Automating Physics Simulations with LLMs0
TIPS: Turn-Level Information-Potential Reward Shaping for Search-Augmented LLMs0
The Deep-Match Framework for Event-Related Potential Detection in EEG0
FinReflectKG -- HalluBench: GraphRAG Hallucination Benchmark for Financial Question Answering Systems0
AI Detectors Fail Diverse Student Populations: A Mathematical Framing of Structural Detection Limits0
Abjad-Kids: An Arabic Speech Classification Dataset for Primary Education0
SciNav: A General Agent Framework for Scientific Coding Tasks0
PRIME-CVD: A Parametrically Rendered Informatics Medical Environment for Education in Cardiovascular Risk Modelling0
BrainSCL: Subtype-Guided Contrastive Learning for Brain Disorder Diagnosis0
TTQ: Activation-Aware Test-Time Quantization to Accelerate LLM Inference On The Fly0
CLaRE-ty Amid Chaos: Quantifying Representational Entanglement to Predict Ripple Effects in LLM Editing0
A Dynamic Bayesian and Machine Learning Framework for Quantitative Evaluation and Prediction of Operator Situation Awareness in Nuclear Power Plants0
Parameter-Efficient Token Embedding Editing for Clinical Class-Level Unlearning0
Taming Epilepsy: Mean Field Control of Whole-Brain Dynamics0
SimulU: Training-free Policy for Long-form Simultaneous Speech-to-Speech Translation0
Comparative Analysis of Deep Learning Architectures for Multi-Disease Classification of Single-Label Chest X-rays0
QV May Be Enough: Toward the Essence of Attention in LLMs0
Querying Everything Everywhere All at Once: Supervaluationism for the Agentic Lakehouse0
Show:102550
← PrevPage 153 of 13232Next →