SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 1575115800 of 474278 papers

TitleStatusHype
Backdoor Cleaning without External Guidance in MLLM Fine-tuningCode1
KTAE: A Model-Free Algorithm to Key-Tokens Advantage Estimation in Mathematical ReasoningCode1
Transformer Copilot: Learning from The Mistake Log in LLM Fine-tuningCode1
A Comprehensive Evaluation of Contemporary ML-Based Solvers for Combinatorial OptimizationCode1
Let Androids Dream of Electric Sheep: A Human-like Image Implication Understanding and Reasoning FrameworkCode1
MPO: Multilingual Safety Alignment via Reward Gap OptimizationCode1
CASS: Nvidia to AMD Transpilation with Data, Models, and BenchmarkCode1
Teaching Large Language Models to Maintain Contextual Faithfulness via Synthetic Tasks and Reinforcement LearningCode1
From EduVisBench to EduVisAgent: A Benchmark and Multi-Agent Framework for Pedagogical VisualizationCode1
UFT: Unifying Supervised and Reinforcement Fine-TuningCode1
RE-TRIP : Reflectivity Instance Augmented Triangle Descriptor for 3D Place RecognitionCode1
Learning Adaptive and Temporally Causal Video Tokenization in a 1D Latent SpaceCode1
Do Large Language Models Excel in Complex Logical Reasoning with Formal Language?Code1
ICYM2I: The illusion of multimodal informativeness under missingnessCode1
AdvReal: Adversarial Patch Generation Framework with Application to Adversarial Safety Evaluation of Object Detection SystemsCode1
ARB: A Comprehensive Arabic Multimodal Reasoning BenchmarkCode1
V2V: Scaling Event-Based Vision through Efficient Video-to-Voxel SimulationCode1
Deliberation on Priors: Trustworthy Reasoning of Large Language Models on Knowledge GraphsCode1
The Atlas of In-Context Learning: How Attention Heads Shape In-Context Retrieval AugmentationCode1
GUI-G1: Understanding R1-Zero-Like Training for Visual Grounding in GUI AgentsCode1
CineTechBench: A Benchmark for Cinematographic Technique Understanding and GenerationCode1
SAMA-UNet: Enhancing Medical Image Segmentation with Self-Adaptive Mamba-Like Attention and Causal-Resonance LearningCode1
From Problem-Solving to Teaching Problem-Solving: Aligning LLMs with Pedagogy using Reinforcement LearningCode1
Streamline Without Sacrifice -- Squeeze out Computation Redundancy in LMMCode1
The Unreasonable Effectiveness of Entropy Minimization in LLM ReasoningCode1
Steering Generative Models with Experimental Data for Protein Fitness OptimizationCode1
Be Careful When Fine-tuning On Open-Source LLMs: Your Fine-tuning Data Could Be Secretly Stolen!Code1
NOVER: Incentive Training for Language Models via Verifier-Free Reinforcement LearningCode1
Angle Domain Guidance: Latent Diffusion Requires Rotation Rather Than ExtrapolationCode1
How Memory Management Impacts LLM Agents: An Empirical Study of Experience-Following BehaviorCode1
ModelingAgent: Bridging LLMs and Mathematical Modeling for Real-World ChallengesCode1
Sonnet: Spectral Operator Neural Network for Multivariable Time Series ForecastingCode1
Continuous Representation Methods, Theories, and Applications: An Overview and PerspectivesCode1
Intentional Gesture: Deliver Your Intentions with Gestures for SpeechCode1
Learning to Reason via Mixture-of-Thought for Logical ReasoningCode1
The Devil is in Fine-tuning and Long-tailed Problems:A New Benchmark for Scene Text DetectionCode1
X-GRM: Large Gaussian Reconstruction Model for Sparse-view X-rays to Computed TomographyCode1
PiFlow: Principle-aware Scientific Discovery with Multi-Agent CollaborationCode1
Towards Holistic Evaluation of Large Audio-Language Models: A Comprehensive SurveyCode1
DeepKD: A Deeply Decoupled and Denoised Knowledge Distillation TrainerCode1
Pre-training Large Memory Language Models with Internal and External KnowledgeCode1
ThinkRec: Thinking-based recommendation via LLMCode1
Audio Jailbreak: An Open Comprehensive Benchmark for Jailbreaking Large Audio-Language ModelsCode1
Multimodal Conditional Information Bottleneck for Generalizable AI-Generated Image DetectionCode1
Silent Leaks: Implicit Knowledge Extraction Attack on RAG Systems through Benign QueriesCode1
HopWeaver: Synthesizing Authentic Multi-Hop Questions Across Text CorporaCode1
RLBenchNet: The Right Network for the Right Reinforcement Learning TaskCode1
Training Step-Level Reasoning Verifiers with Formal Verification ToolsCode1
UWSAM: Segment Anything Model Guided Underwater Instance Segmentation and A Large-scale Benchmark DatasetCode1
Stronger ViTs With Octic EquivarianceCode1
Show:102550
← PrevPage 316 of 9486Next →