SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 1545115500 of 474278 papers

TitleStatusHype
Normalizing Flows are Capable Models for RLCode1
Proximal Algorithm Unrolling: Flexible and Efficient Reconstruction Networks for Single-Pixel ImagingCode1
3DGEER: Exact and Efficient Volumetric Rendering with 3D GaussiansCode1
ProDiff: Prototype-Guided Diffusion for Minimal Information Trajectory ImputationCode1
VCapsBench: A Large-scale Fine-grained Benchmark for Video Caption Quality EvaluationCode1
TimePoint: Accelerated Time Series Alignment via Self-Supervised Keypoint and Descriptor LearningCode1
Context Robust Knowledge Editing for Language ModelsCode1
Toward Memory-Aided World Models: Benchmarking via Spatial ConsistencyCode1
DeepTheorem: Advancing LLM Reasoning for Theorem Proving Through Natural Language and Reinforcement LearningCode1
SafeScientist: Toward Risk-Aware Scientific Discoveries by LLM AgentsCode1
Data-to-Dashboard: Multi-Agent LLM Framework for Insightful Visualization in Enterprise AnalyticsCode1
Uni-MuMER: Unified Multi-Task Fine-Tuning of Vision-Language Model for Handwritten Mathematical Expression RecognitionCode1
ToMAP: Training Opponent-Aware LLM Persuaders with Theory of MindCode1
Jigsaw-R1: A Study of Rule-based Visual Reinforcement Learning with Jigsaw PuzzlesCode1
URWKV: Unified RWKV Model with Multi-state Perspective for Low-light Image RestorationCode1
DenoiseRotator: Enhance Pruning Robustness for LLMs via Importance ConcentrationCode1
Holistic Large-Scale Scene Reconstruction via Mixed Gaussian SplattingCode1
SAMamba: Adaptive State Space Modeling with Hierarchical Vision for Infrared Small Target DetectionCode1
To Trust Or Not To Trust Your Vision-Language Model's PredictionCode1
FlowAlign: Trajectory-Regularized, Inversion-Free Flow-based Image EditingCode1
Model Immunization from a Condition Number PerspectiveCode1
The Panaceas for Improving Low-Rank Decomposition in Communication-Efficient Federated LearningCode1
PreFM: Online Audio-Visual Event Parsing via Predictive Future ModelingCode1
Zero-to-Hero: Zero-Shot Initialization Empowering Reference-Based Video Appearance EditingCode1
Sentinel: Attention Probing of Proxy Models for LLM Context Compression with an Understanding PerspectiveCode1
Segment Policy Optimization: Effective Segment-Level Credit Assignment in RL for Large Language ModelsCode1
Wav2Sem: Plug-and-Play Audio Semantic Decoupling for 3D Speech-Driven Facial AnimationCode1
AnchorAttention: Difference-Aware Sparse Attention with Stripe GranularityCode1
Neural Interpretable PDEs: Harmonizing Fourier Insights with Attention for Scalable and Interpretable Physics DiscoveryCode1
MMGT: Motion Mask Guided Two-Stage Network for Co-Speech Gesture Video GenerationCode1
K^2VAE: A Koopman-Kalman Enhanced Variational AutoEncoder for Probabilistic Time Series ForecastingCode1
Satori-SWE: Evolutionary Test-Time Scaling for Sample-Efficient Software EngineeringCode1
DA-VPT: Semantic-Guided Visual Prompt Tuning for Vision TransformersCode1
VideoReasonBench: Can MLLMs Perform Vision-Centric Complex Video Reasoning?Code1
Interpreting Chest X-rays Like a Radiologist: A Benchmark with Clinical ReasoningCode1
Improving the Effective Receptive Field of Message-Passing Neural NetworksCode1
Table-R1: Inference-Time Scaling for Table ReasoningCode1
Advancing Multimodal Reasoning via Reinforcement Learning with Cold StartCode1
Test-Time Adaptation of Vision-Language Models for Open-Vocabulary Semantic SegmentationCode1
Neuromorphic Sequential Arena: A Benchmark for Neuromorphic Temporal ProcessingCode1
RedTeamCUA: Realistic Adversarial Testing of Computer-Use Agents in Hybrid Web-OS EnvironmentsCode1
Hybrid Batch Normalisation: Resolving the Dilemma of Batch Normalisation in Federated LearningCode1
LoKI: Low-damage Knowledge Implanting of Large Language ModelsCode1
VidText: Towards Comprehensive Evaluation for Video Text UnderstandingCode1
Analysis and Evaluation of Synthetic Data Generation in Speech Dysfluency DetectionCode1
Fast Isotropic Median FilteringCode1
UniTalk: Towards Universal Active Speaker Detection in Real World ScenariosCode1
Measuring Sycophancy of Language Models in Multi-turn DialoguesCode1
Speculative Decoding Meets Quantization: Compatibility Evaluation and Hierarchical Framework DesignCode1
CSI-Bench: A Large-Scale In-the-Wild Dataset for Multi-task WiFi SensingCode1
Show:102550
← PrevPage 310 of 9486Next →