SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 67516800 of 661570 papers

TitleStatusHype
TimeMarker: A Versatile Video-LLM for Long and Short Video Understanding with Superior Temporal Localization AbilityCode2
GaussianSpeech: Audio-Driven Gaussian AvatarsCode2
Monocular Obstacle Avoidance Based on Inverse PPO for Fixed-wing UAVsCode2
TryOffDiff: Virtual-Try-Off via High-Fidelity Garment Reconstruction using Diffusion ModelsCode2
vesselFM: A Foundation Model for Universal 3D Blood Vessel SegmentationCode2
MotionLLaMA: A Unified Framework for Motion Synthesis and ComprehensionCode2
Path-RAG: Knowledge-Guided Key Region Retrieval for Open-ended Pathology Visual Question AnsweringCode2
Task Singular Vectors: Reducing Task Interference in Model MergingCode2
Omegance: A Single Parameter for Various Granularities in Diffusion-Based SynthesisCode2
Towards Stabilized and Efficient Diffusion Transformers through Long-Skip-Connections with Spectral ConstraintsCode2
Pretrained LLM Adapted with LoRA as a Decision Transformer for Offline RL in Quantitative TradingCode2
HyperSeg: Towards Universal Visual Segmentation with Large Language ModelCode2
Boost 3D Reconstruction using Diffusion-based Monocular Camera CalibrationCode2
Collaborative Decoding Makes Visual Auto-Regressive Modeling EfficientCode2
MWFormer: Multi-Weather Image Restoration Using Degradation-Aware TransformersCode2
PassionSR: Post-Training Quantization with Adaptive Scale in One-Step Diffusion based Image Super-ResolutionCode2
Grounding-IQA: Multimodal Language Grounding Model for Image Quality AssessmentCode2
OpenAD: Open-World Autonomous Driving Benchmark for 3D Object DetectionCode2
DreamMix: Decoupling Object Attributes for Enhanced Editability in Customized Image InpaintingCode2
TinyViM: Frequency Decoupling for Tiny Hybrid Vision MambaCode2
Monocular Lane Detection Based on Deep Learning: A SurveyCode2
Scaling Spike-driven Transformer with Efficient Spike Firing Approximation TrainingCode2
UltraSam: A Foundation Model for Ultrasound using Large Open-Access Segmentation DatasetsCode2
Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question AnsweringCode2
Interpreting Object-level Foundation Models via Visual Precision SearchCode2
Open Vocabulary Monocular 3D Object DetectionCode2
Probing the limitations of multimodal language models for chemistry and materials researchCode2
Exploring Discrete Flow Matching for 3D De Novo Molecule GenerationCode2
Ca2-VDM: Efficient Autoregressive Video Diffusion Model with Causal Generation and Cache SharingCode2
Efficient Video Face Enhancement with Enhanced Spatial-Temporal ConsistencyCode2
Edit Away and My Face Will not Stay: Personal Biometric Defense against Malicious Generative EditingCode2
ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image ExplorationCode2
UniPose: A Unified Multimodal Framework for Human Pose Comprehension, Generation and EditingCode2
SplatFlow: Multi-View Rectified Flow Model for 3D Gaussian Splatting SynthesisCode2
An End-to-End Robust Point Cloud Semantic Segmentation Network with Single-Step Conditional Diffusion ModelsCode2
Fancy123: One Image to High-Quality 3D Mesh Generation via Plug-and-Play DeformationCode2
Preference Optimization for Reasoning with Pseudo FeedbackCode2
MVGenMaster: Scaling Multi-View Generation from Any Image via 3D Priors Enhanced Diffusion ModelCode2
Self-Calibrated CLIP for Training-Free Open-Vocabulary SegmentationCode2
LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-TrainingCode2
ResCLIP: Residual Attention for Training-free Dense Vision-language InferenceCode2
Devils in Middle Layers of Large Vision-Language Models: Interpreting, Detecting and Mitigating Object Hallucinations via Attention LensCode2
Towards Satellite Image Road Graph Extraction: A Global-Scale Dataset and A Novel MethodCode2
Large Language Model with Region-guided Referring and Grounding for CT Report GenerationCode2
AeroGen: Enhancing Remote Sensing Object Detection with Diffusion-Driven Data GenerationCode2
What Makes a Scene ? Scene Graph-based Evaluation and Feedback for Controllable GenerationCode2
Gotta Hear Them All: Sound Source Aware Vision to Audio GenerationCode2
Steering Away from Harm: An Adaptive Approach to Defending Vision Language Model Against JailbreaksCode2
Multi-Reranker: Maximizing performance of retrieval-augmented generation in the FinanceRAG challengeCode2
A Survey on LLM-as-a-JudgeCode2
Show:102550
← PrevPage 136 of 13232Next →