SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 70517100 of 177340 papers

TitleStatusHype
Boosting Global-Local Feature Matching via Anomaly Synthesis for Multi-Class Point Cloud Anomaly DetectionCode2
A Tutorial on Structural Identifiability of Epidemic Models Using StructuralIdentifiability.jlCode2
DexGarmentLab: Dexterous Garment Manipulation Environment with Generalizable PolicyCode2
RBF++: Quantifying and Optimizing Reasoning Boundaries across Measurable and Unmeasurable Capabilities for Chain-of-Thought ReasoningCode2
CPRet: A Dataset, Benchmark, and Model for Retrieval in Competitive ProgrammingCode2
Recollection from Pensieve: Novel View Synthesis via Learning from Uncalibrated VideosCode2
Neurosymbolic Diffusion ModelsCode2
Temporal Query Network for Efficient Multivariate Time Series ForecastingCode2
Efficient Speech Language Modeling via Energy Distance in Continuous Latent SpaceCode2
UniCTokens: Boosting Personalized Understanding and Generation via Unified Concept TokensCode2
KORGym: A Dynamic Game Platform for LLM Reasoning EvaluationCode2
MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image SynthesisCode2
QuickVideo: Real-Time Long Video Understanding with System Algorithm Co-DesignCode2
Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language ModelsCode2
LiteCUA: Computer as MCP Server for Computer-Use Agent on AIOSCode2
Improved Immiscible Diffusion: Accelerate Diffusion Training by Reducing Its MiscibilityCode2
Shifting AI Efficiency From Model-Centric to Data-Centric CompressionCode2
DoctorAgent-RL: A Multi-Agent Collaborative Reinforcement Learning System for Multi-Turn Clinical DialogueCode2
Memory-Efficient Visual Autoregressive Modeling with Scale-Aware KV Cache CompressionCode2
Chain-of-Thought for Autonomous Driving: A Comprehensive Survey and Future ProspectsCode2
Muddit: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion ModelCode2
Aligning Modalities in Vision Large Language Models via Preference Fine-tuningCode2
Vision Language Models are BiasedCode2
Unifying Appearance Codes and Bilateral Grids for Driving Scene Gaussian SplattingCode2
CyberGym: Evaluating AI Agents' Cybersecurity Capabilities with Real-World Vulnerabilities at ScaleCode2
GSCodec Studio: A Modular Framework for Gaussian Splat CompressionCode2
MagiCodec: Simple Masked Gaussian-Injected Codec for High-Fidelity Reconstruction and GenerationCode2
Attentive Merging of Hidden Embeddings from Pre-trained Speech Model for Anti-spoofing DetectionCode2
ShiftwiseConv: Small Convolutional Kernel with Large Kernel EffectCode2
Router-R1: Teaching LLMs Multi-Round Routing and Aggregation via Reinforcement LearningCode2
Institutional Books 1.0: A 242B token dataset from Harvard Library's collections, refined for accuracy and usabilityCode2
Do MIL Models Transfer?Code2
SDialog: A Python Toolkit for Synthetic Dialogue Generation and AnalysisCode2
Vision Transformers Don't Need Trained RegistersCode2
AutoMind: Adaptive Knowledgeable Agent for Automated Data ScienceCode2
UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI AgentsCode2
IntPhys 2: Benchmarking Intuitive Physics Understanding In Complex Synthetic EnvironmentsCode2
VerIF: Verification Engineering for Reinforcement Learning in Instruction FollowingCode2
Solving the Job Shop Scheduling Problem with Graph Neural Networks: A Customizable Reinforcement Learning EnvironmentCode2
AnalogNAS-Bench: A NAS Benchmark for Analog In-Memory ComputingCode2
Confucius3-Math: A Lightweight High-Performance Reasoning LLM for Chinese K-12 Mathematics LearningCode2
Towards In-the-wild 3D Plane Reconstruction from a Single ImageCode2
Test3R: Learning to Reconstruct 3D at Test TimeCode2
Parallels Between VLA Model Post-Training and Human Motor Learning: Progress, Challenges, and TrendsCode2
Flow-Anchored Consistency ModelsCode2
Feed-Forward SceneDINO for Unsupervised Semantic Scene CompletionCode2
EAMamba: Efficient All-Around Vision State Space Model for Image RestorationCode2
Open Source Planning & Control System with Language Agents for Autonomous Scientific DiscoveryCode2
CaRL: Learning Scalable Planning Policies with Simple RewardsCode2
HiMTok: Learning Hierarchical Mask Tokens for Image Segmentation with Large Multimodal ModelCode2
Show:102550
← PrevPage 142 of 3547Next →