SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 1290112950 of 177340 papers

TitleStatusHype
Hybrid Internal Model: Learning Agile Legged Locomotion with Simulated Robot ResponseCode2
EnCLAP++: Analyzing the EnCLAP Framework for Optimizing Automated Audio Captioning PerformanceCode2
GenLoco: Generalized Locomotion Controllers for Quadrupedal RobotsCode2
Leopard: A Vision Language Model For Text-Rich Multi-Image TasksCode2
Less is More: Mitigating Multimodal Hallucination from an EOS Decision PerspectiveCode2
Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved with TextCode2
ProbPose: A Probabilistic Approach to 2D Human Pose EstimationCode2
CARD: Classification and Regression Diffusion ModelsCode2
DeTPP: Leveraging Object Detection for Robust Long-Horizon Event PredictionCode2
TorchSpatial: A Location Encoding Framework and Benchmark for Spatial Representation LearningCode2
OrthoPlanes: A Novel Representation for Better 3D-Awareness of GANsCode2
I2MoE: Interpretable Multimodal Interaction-aware Mixture-of-ExpertsCode2
Box2Mask: Box-supervised Instance Segmentation via Level-set EvolutionCode2
GSO: Challenging Software Optimization Tasks for Evaluating SWE-AgentsCode2
RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic ControlCode2
OcclusionFusion: Occlusion-aware Motion Estimation for Real-time Dynamic 3D ReconstructionCode2
Motion-X: A Large-scale 3D Expressive Whole-body Human Motion DatasetCode2
Routoo: Learning to Route to Large Language Models EffectivelyCode2
Diff2Lip: Audio Conditioned Diffusion Models for Lip-SynchronizationCode2
Objects as PointsCode2
Do You Remember? Dense Video Captioning with Cross-Modal Memory RetrievalCode2
CVE-Bench: A Benchmark for AI Agents' Ability to Exploit Real-World Web Application VulnerabilitiesCode2
Crab: A Unified Audio-Visual Scene Understanding Model with Explicit CooperationCode2
Diffusion-RWKV: Scaling RWKV-Like Architectures for Diffusion ModelsCode2
Dataset Regeneration for Sequential RecommendationCode2
CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language ModelsCode2
M^2SNet: Multi-scale in Multi-scale Subtraction Network for Medical Image SegmentationCode2
VQF: Highly Accurate IMU Orientation Estimation with Bias Estimation and Magnetic Disturbance RejectionCode2
REAL-Colon: A dataset for developing real-world AI applications in colonoscopyCode2
SODA: Million-scale Dialogue Distillation with Social Commonsense ContextualizationCode2
Context-Aware Video Instance SegmentationCode2
Benchmarking Graph Neural NetworksCode2
PyMAF-X: Towards Well-aligned Full-body Model Regression from Monocular ImagesCode2
Path-RAG: Knowledge-Guided Key Region Retrieval for Open-ended Pathology Visual Question AnsweringCode2
MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simulated-World ControlCode2
DeCo: Decoupling Token Compression from Semantic Abstraction in Multimodal Large Language ModelsCode2
PerCo (SD): Open Perceptual CompressionCode2
Frequency Decoupling for Motion Magnification via Multi-Level Isomorphic ArchitectureCode2
MINERVA: Evaluating Complex Video ReasoningCode2
RoboPianist: Dexterous Piano Playing with Deep Reinforcement LearningCode2
Universal Guidance for Diffusion ModelsCode2
Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image SynthesisCode2
GenerSpeech: Towards Style Transfer for Generalizable Out-Of-Domain Text-to-SpeechCode2
Investigating the Role of Image Retrieval for Visual Localization -- An exhaustive benchmarkCode2
Piloting Structure-Based Drug Design via Modality-Specific Optimal ScheduleCode2
Autonomous Catheterization with Open-source Simulator and Expert TrajectoryCode2
Data-Centric Foundation Models in Computational Healthcare: A SurveyCode2
BAPLe: Backdoor Attacks on Medical Foundational Models using Prompt LearningCode2
LRM-Zero: Training Large Reconstruction Models with Synthesized DataCode2
DiffuSeq-v2: Bridging Discrete and Continuous Text Spaces for Accelerated Seq2Seq Diffusion ModelsCode2
Show:102550
← PrevPage 259 of 3547Next →