SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 1540115450 of 474278 papers

TitleStatusHype
STORK: Improving the Fidelity of Mid-NFE Sampling for Diffusion and Flow Matching ModelsCode1
Conformal Prediction for Zero-Shot ModelsCode1
Geo-Sign: Hyperbolic Contrastive Regularisation for Geometrically Aware Sign Language TranslationCode1
Harnessing Negative Signals: Reinforcement Distillation from Teacher Data for LLM ReasoningCode1
Beyond FACS: Data-driven Facial Expression Dictionaries, with Application to Predicting AutismCode1
Unleashing High-Quality Image Generation in Diffusion Sampling Using Second-Order Levenberg-Marquardt-LangevinCode1
Efficient RAW Image Deblurring with Adaptive Frequency ModulationCode1
Learning Safety Constraints for Large Language ModelsCode1
Causal-aware Large Language Models: Enhancing Decision-Making Through Learning, Adapting and ActingCode1
EgoExOR: An Ego-Exo-Centric Operating Room Dataset for Surgical Activity UnderstandingCode1
Context is Gold to find the Gold Passage: Evaluating and Training Contextual Document EmbeddingsCode1
Beyond the LUMIR challenge: The pathway to foundational registration modelsCode1
Seeing is Not Reasoning: MVPBench for Graph-based Evaluation of Multi-path Visual Physical CoTCode1
Don't Reinvent the Wheel: Efficient Instruction-Following Text Embedding based on Guided Space TransformationCode1
Mastering Massive Multi-Task Reinforcement Learning via Mixture-of-Expert Decision TransformerCode1
IRBridge: Solving Image Restoration Bridge with Pre-trained Generative Diffusion ModelsCode1
SiLVR: A Simple Language-based Video Reasoning FrameworkCode1
Exploring Multimodal Challenges in Toxic Chinese Detection: Taxonomy, Benchmark, and FindingsCode1
Sorrel: A simple and flexible framework for multi-agent reinforcement learningCode1
Agent-X: Evaluating Deep Multimodal Reasoning in Vision-Centric Agentic TasksCode1
ByzFL: Research Framework for Robust Federated LearningCode1
Large Language Models are Locally Linear MappingsCode1
HELM: Hyperbolic Large Language Models via Mixture-of-Curvature ExpertsCode1
The Hallucination Dilemma: Factuality-Aware Reinforcement Learning for Large Reasoning ModelsCode1
Unifying Language Agent Algorithms with Graph-based Orchestration Engine for Reproducible Agent ResearchCode1
3D Gaussian Splat VulnerabilitiesCode1
un^2CLIP: Improving CLIP's Visual Detail Capturing Ability via Inverting unCLIPCode1
DisTime: Distribution-based Time Representation for Video Large Language ModelsCode1
Towards Effective Code-Integrated ReasoningCode1
Boosting All-in-One Image Restoration via Self-Improved Privilege LearningCode1
Period-LLM: Extending the Periodic Capability of Multimodal Large Language ModelCode1
ProxyThinker: Test-Time Guidance through Small Visual ReasonersCode1
TimeHC-RL: Temporal-aware Hierarchical Cognitive Reinforcement Learning for Enhancing LLMs' Social IntelligenceCode1
ScienceMeter: Tracking Scientific Knowledge Updates in Language ModelsCode1
BPE Stays on SCRIPT: Structured Encoding for Robust Multilingual PretokenizationCode1
Reinforcing Video Reasoning with Focused ThinkingCode1
RT-X Net: RGB-Thermal cross attention network for Low-Light Image EnhancementCode1
Draw ALL Your Imagine: A Holistic Benchmark and Agent Framework for Complex Instruction-based Image GenerationCode1
FreRA: A Frequency-Refined Augmentation for Contrastive Learning on Time Series ClassificationCode1
Cora: Correspondence-aware image editing using few step diffusionCode1
Foundation Molecular Grammar: Multi-Modal Foundation Models Induce Interpretable Molecular Graph LanguagesCode1
Label-Guided In-Context Learning for Named Entity RecognitionCode1
LADA: Scalable Label-Specific CLIP Adapter for Continual LearningCode1
Puzzled by Puzzles: When Vision-Language Models Can't Take a HintCode1
CrossLinear: Plug-and-Play Cross-Correlation Embedding for Time Series Forecasting with Exogenous VariablesCode1
The Automated but Risky Game: Modeling Agent-to-Agent Negotiations and Transactions in Consumer MarketsCode1
Are Unified Vision-Language Models Necessary: Generalization Across Understanding and GenerationCode1
Directed Graph Grammars for Sequence-based LearningCode1
Accelerating AllReduce with a Persistent StragglerCode1
How does Transformer Learn Implicit Reasoning?Code1
Show:102550
← PrevPage 309 of 9486Next →