SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 1490114950 of 474278 papers

TitleStatusHype
U-RWKV: Lightweight medical image segmentation with direction-adaptive RWKVCode1
Text-Visual Semantic Constrained AI-Generated Image Quality AssessmentCode1
REST: Stress Testing Large Reasoning Models by Asking Multiple Problems at OnceCode1
IM-LUT: Interpolation Mixing Look-Up Tables for Image Super-ResolutionCode1
Graph World ModelCode1
4D-Animal: Freely Reconstructing Animatable 3D Animals from VideosCode1
WildFX: A DAW-Powered Pipeline for In-the-Wild Audio FX Graph ModelingCode1
Warehouse Spatial Question Answering with LLM AgentCode1
Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data ContaminationCode1
Deep Reinforcement Learning with Gradient Eligibility TracesCode1
Conformation-Aware Structure Prediction of Antigen-Recognizing Immune ProteinsCode1
BrainLesion Suite: A Flexible and User-Friendly Framework for Modular Brain Lesion Image AnalysisCode1
RadiomicsRetrieval: A Customizable Framework for Medical Image Retrieval Using Radiomics FeaturesCode1
A Practical Two-Stage Recipe for Mathematical LLMs: Maximizing Accuracy with SFT and Efficiency with Reinforcement LearningCode1
Exploring Design of Multi-Agent LLM Dialogues for Research IdeationCode1
Disentangling Instance and Scene Contexts for 3D Semantic Scene CompletionCode1
Dual Dimensions Geometric Representation Learning Based Document DewarpingCode1
Compress Any Segment Anything Model (SAM)Code1
Rethinking Query-based Transformer for Continual Image SegmentationCode1
Seg-Wild: Interactive Segmentation based on 3D Gaussian Splatting for Unconstrained Image CollectionsCode1
HiM2SAM: Enhancing SAM2 with Hierarchical Motion Estimation and Memory Optimization towards Long-term TrackingCode1
PacGDC: Label-Efficient Generalizable Depth Completion with Projection Ambiguity and ConsistencyCode1
NLGCL: Naturally Existing Neighbor Layers Graph Contrastive Learning for RecommendationCode1
Rethinking Verification for LLM Code Generation: From Generation to TestingCode1
HVI-CIDNet+: Beyond Extreme Darkness for Low-Light Image EnhancementCode1
RSRefSeg 2: Decoupling Referring Remote Sensing Image Segmentation with Foundation ModelsCode1
Evaluating Morphological Alignment of Tokenizers in 70 LanguagesCode1
NeoBabel: A Multilingual Open Tower for Visual GenerationCode1
eegFloss: A Python package for refining sleep EEG recordings using machine learning modelsCode1
Prompt-Free Conditional Diffusion for Multi-object Image AugmentationCode1
Robust One-step Speech Enhancement via Consistency DistillationCode1
The Delta Learning Hypothesis: Preference Tuning on Weak Data can Yield Strong GainsCode1
Kamae: Bridging Spark and Keras for Seamless ML PreprocessingCode1
Differential MambaCode1
MCAM: Multimodal Causal Analysis Model for Ego-Vehicle-Level Driving Video UnderstandingCode1
ScoreAdv: Score-based Targeted Generation of Natural Adversarial Examples via Diffusion ModelsCode1
CriticLean: Critic-Guided Reinforcement Learning for Mathematical FormalizationCode1
LangMamba: A Language-driven Mamba Framework for Low-dose CT Denoising with Vision-language ModelsCode1
FindRec: Stein-Guided Entropic Flow for Multi-Modal Sequential RecommendationCode1
LOOM-Scope: a comprehensive and efficient LOng-cOntext Model evaluation frameworkCode1
The Extended SONICOM HRTF Dataset and Spatial Audio Metrics ToolboxCode1
Beyond One Shot, Beyond One Perspective: Cross-View and Long-Horizon Distillation for Better LiDAR RepresentationsCode1
VOTE: Vision-Language-Action Optimization with Trajectory Ensemble VotingCode1
SV-DRR: High-Fidelity Novel View X-Ray Synthesis Using Diffusion ModelCode1
Exploring Remote Physiological Signal Measurement under Dynamic Lighting Conditions at Night: Dataset, Experiment, and AnalysisCode1
LLMThinkBench: Towards Basic Math Reasoning and Overthinking in Large Language ModelsCode1
SAMed-2: Selective Memory Enhanced Medical Segment Anything ModelCode1
CoreCodeBench: A Configurable Multi-Scenario Repository-Level BenchmarkCode1
Be the Change You Want to See: Revisiting Remote Sensing Change Detection PracticesCode1
Be the Change You Want to See: Revisiting Remote Sensing Change Detection PracticesCode1
Show:102550
← PrevPage 299 of 9486Next →