SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 1520115250 of 474278 papers

TitleStatusHype
Draft-based Approximate Inference for LLMsCode1
DiscoVLA: Discrepancy Reduction in Vision, Language, and Alignment for Parameter-Efficient Video-Text RetrievalCode1
Intention-Conditioned Flow Occupancy ModelsCode1
scSSL-Bench: Benchmarking Self-Supervised Learning for Single-Cell DataCode1
KARMA: A Multilevel Decomposition Hybrid Mamba Framework for Multivariate Long-Term Time Series ForecastingCode1
RS-MTDF: Multi-Teacher Distillation and Fusion for Remote Sensing Semi-Supervised Semantic SegmentationCode1
RuleReasoner: Reinforced Rule-based Reasoning via Domain-aware Dynamic SamplingCode1
SDMPrune: Self-Distillation MLP Pruning for Efficient Large Language ModelsCode1
SLEEPYLAND: trust begins with fair evaluation of automatic sleep staging modelsCode1
FaithfulRAG: Fact-Level Conflict Modeling for Context-Faithful Retrieval-Augmented GenerationCode1
Monocular 3D Hand Pose Estimation with Implicit Camera AlignmentCode1
syren-baryon: Analytic emulators for the impact of baryons on the matter power spectrumCode1
SWE-Flow: Synthesizing Software Engineering Data in a Test-Driven MannerCode1
SwS: Self-aware Weakness-driven Problem Synthesis in Reinforcement Learning for LLM ReasoningCode1
Know-MRI: A Knowledge Mechanisms Revealer&Interpreter for Large Language ModelsCode1
PatchGuard: Adversarially Robust Anomaly Detection and Localization through Vision Transformers and Pseudo AnomaliesCode1
Why Masking Diffusion Works: Condition on the Jump Schedule for Improved Discrete DiffusionCode1
Same Task, Different Circuits: Disentangling Modality-Specific Mechanisms in VLMsCode1
Ambient Diffusion Omni: Training Good Models with Bad DataCode1
ConfPO: Exploiting Policy Model Confidence for Critical Token Selection in Preference OptimizationCode1
UTBoost: Rigorous Evaluation of Coding Agents on SWE-BenchCode1
Re4MPC: Reactive Nonlinear MPC for Multi-model Motion Planning via Deep Reinforcement LearningCode1
Adapting Vision-Language Foundation Model for Next Generation Medical Ultrasound Image AnalysisCode1
mLaSDI: Multi-stage latent space dynamics identificationCode1
DRAGged into Conflicts: Detecting and Addressing Conflicting Sources in Search-Augmented LLMsCode1
InceptionMamba: An Efficient Hybrid Network with Large Band Convolution and Bottleneck MambaCode1
Token Perturbation Guidance for Diffusion ModelsCode1
PairEdit: Learning Semantic Variations for Exemplar-based Image EditingCode1
From Debate to Equilibrium: Belief-Driven Multi-Agent LLM Reasoning via Bayesian Nash EquilibriumCode1
Multiple Object Stitching for Unsupervised Representation LearningCode1
Improving large language models with concept-aware fine-tuningCode1
Chasing Moving Targets with Online Self-Play Reinforcement Learning for Safer Language ModelsCode1
Premise Selection for a Lean HammerCode1
CyberV: Cybernetics for Test-time Scaling in Video UnderstandingCode1
SpaCE-10: A Comprehensive Benchmark for Multimodal Large Language Models in Compositional Spatial IntelligenceCode1
SlideCoder: Layout-aware RAG-enhanced Hierarchical Slide Generation from DesignCode1
Curriculum Learning With Counterfactual Group Relative Policy Advantage For Multi-Agent Reinforcement LearningCode1
RSafe: Incentivizing proactive reasoning to build robust and adaptive LLM safeguardsCode1
Rethinking Cross-Modal Interaction in Multimodal Diffusion TransformersCode1
Spatio-Temporal State Space Model For Efficient Event-Based Optical FlowCode1
Graph-Assisted Stitching for Offline Hierarchical Reinforcement LearningCode1
RADAR: Benchmarking Language Models on Imperfect Tabular DataCode1
Egocentric Event-Based Vision for Ping Pong Ball Trajectory PredictionCode1
C3S3: Complementary Competition and Contrastive Selection for Semi-Supervised Medical Image SegmentationCode1
Generative Modeling of Weights: Generalization or Memorization?Code1
Compound AI Systems Optimization: A Survey of Methods, Challenges, and Future DirectionsCode1
Diffusion Sequence Models for Enhanced Protein Representation and GenerationCode1
Flowing Datasets with Wasserstein over Wasserstein Gradient FlowsCode1
Well Begun is Half Done: Low-resource Preference Alignment by Weak-to-Strong DecodingCode1
CheMatAgent: Enhancing LLMs for Chemistry and Materials Science through Tree-Search Based Tool LearningCode1
Show:102550
← PrevPage 305 of 9486Next →