SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 1320113250 of 474278 papers

TitleStatusHype
In Defense of Online Models for Video Instance SegmentationCode2
More Agents Is All You NeedCode2
You Only Look at Screens: Multimodal Chain-of-Action AgentsCode2
NoLiMa: Long-Context Evaluation Beyond Literal MatchingCode2
MasRouter: Learning to Route LLMs for Multi-Agent SystemsCode2
LightningDrag: Lightning Fast and Accurate Drag-based Image Editing Emerging from VideosCode2
Are We There Yet? A Brief Survey of Music Emotion Prediction Datasets, Models and Outstanding ChallengesCode2
STimage-1K4M: A histopathology image-gene expression dataset for spatial transcriptomicsCode2
What the DAAM: Interpreting Stable Diffusion Using Cross AttentionCode2
Video Diffusion Models: A SurveyCode2
LKM-UNet: Large Kernel Vision Mamba UNet for Medical Image SegmentationCode2
Distill Visual Chart Reasoning Ability from LLMs to MLLMsCode2
UniMSE: Towards Unified Multimodal Sentiment Analysis and Emotion RecognitionCode2
Flow Matching for Medical Image Synthesis: Bridging the Gap Between Speed and QualityCode2
Transformer-VQ: Linear-Time Transformers via Vector QuantizationCode2
CoLLiE: Collaborative Training of Large Language Models in an Efficient WayCode2
BAMM: Bidirectional Autoregressive Motion ModelCode2
Vision Language Action Models in Robotic Manipulation: A Systematic ReviewCode2
AnomalyDiffusion: Few-Shot Anomaly Image Generation with Diffusion ModelCode2
Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier EngineeringCode2
The AdEMAMix Optimizer: Better, Faster, OlderCode2
CDDFuse: Correlation-Driven Dual-Branch Feature Decomposition for Multi-Modality Image FusionCode2
TIP: Tabular-Image Pre-training for Multimodal Classification with Incomplete DataCode2
Automated Self-Supervised Learning for RecommendationCode2
BAT: Benchmark for Auto-bidding TaskCode2
ONCE-3DLanes: Building Monocular 3D Lane DetectionCode2
Task Me AnythingCode2
Xmodel-VLM: A Simple Baseline for Multimodal Vision Language ModelCode2
Generative Enhancement for 3D Medical ImagesCode2
SAM-Med3D: Towards General-purpose Segmentation Models for Volumetric Medical ImagesCode2
Less is More: Efficient Black-box Attribution via Minimal Interpretable Subset SelectionCode2
Multi-modal Situated Reasoning in 3D ScenesCode2
Semantic-Aligned Matching for Enhanced DETR Convergence and Multi-Scale Feature FusionCode2
SeerAttention-R: Sparse Attention Adaptation for Long ReasoningCode2
PsyDraw: A Multi-Agent Multimodal System for Mental Health Screening in Left-Behind ChildrenCode2
Towards Practical Second-Order Optimizers in Deep Learning: Insights from Fisher Information AnalysisCode2
Towards 3D Molecule-Text Interpretation in Language ModelsCode2
Sketch Video SynthesisCode2
ML-Agent: Reinforcing LLM Agents for Autonomous Machine Learning EngineeringCode2
Faceptor: A Generalist Model for Face PerceptionCode2
Video Object Segmentation in Panoptic Wild ScenesCode2
Where do Large Vision-Language Models Look at when Answering Questions?Code2
Video-Based Human Pose Regression via Decoupled Space-Time AggregationCode2
Prototypical Information Bottlenecking and Disentangling for Multimodal Cancer Survival PredictionCode2
Tackling View-Dependent Semantics in 3D Language Gaussian SplattingCode2
HumanOmniV2: From Understanding to Omni-Modal Reasoning with ContextCode2
MMToM-QA: Multimodal Theory of Mind Question AnsweringCode2
Auto-PyTorch Tabular: Multi-Fidelity MetaLearning for Efficient and Robust AutoDLCode2
FlowDB a large scale precipitation, river, and flash flood datasetCode2
Wavelet Diffusion Models are fast and scalable Image GeneratorsCode2
Show:102550
← PrevPage 265 of 9486Next →