SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 1010110150 of 661570 papers

TitleStatusHype
I2MoE: Interpretable Multimodal Interaction-aware Mixture-of-ExpertsCode2
Box2Mask: Box-supervised Instance Segmentation via Level-set EvolutionCode2
GSO: Challenging Software Optimization Tasks for Evaluating SWE-AgentsCode2
RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic ControlCode2
OcclusionFusion: Occlusion-aware Motion Estimation for Real-time Dynamic 3D ReconstructionCode2
Motion-X: A Large-scale 3D Expressive Whole-body Human Motion DatasetCode2
Routoo: Learning to Route to Large Language Models EffectivelyCode2
Diff2Lip: Audio Conditioned Diffusion Models for Lip-SynchronizationCode2
Objects as PointsCode2
Do You Remember? Dense Video Captioning with Cross-Modal Memory RetrievalCode2
CVE-Bench: A Benchmark for AI Agents' Ability to Exploit Real-World Web Application VulnerabilitiesCode2
Crab: A Unified Audio-Visual Scene Understanding Model with Explicit CooperationCode2
Diffusion-RWKV: Scaling RWKV-Like Architectures for Diffusion ModelsCode2
Dataset Regeneration for Sequential RecommendationCode2
CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language ModelsCode2
M^2SNet: Multi-scale in Multi-scale Subtraction Network for Medical Image SegmentationCode2
VQF: Highly Accurate IMU Orientation Estimation with Bias Estimation and Magnetic Disturbance RejectionCode2
REAL-Colon: A dataset for developing real-world AI applications in colonoscopyCode2
SODA: Million-scale Dialogue Distillation with Social Commonsense ContextualizationCode2
Context-Aware Video Instance SegmentationCode2
Benchmarking Graph Neural NetworksCode2
PyMAF-X: Towards Well-aligned Full-body Model Regression from Monocular ImagesCode2
Path-RAG: Knowledge-Guided Key Region Retrieval for Open-ended Pathology Visual Question AnsweringCode2
MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simulated-World ControlCode2
DeCo: Decoupling Token Compression from Semantic Abstraction in Multimodal Large Language ModelsCode2
PerCo (SD): Open Perceptual CompressionCode2
Frequency Decoupling for Motion Magnification via Multi-Level Isomorphic ArchitectureCode2
MINERVA: Evaluating Complex Video ReasoningCode2
RoboPianist: Dexterous Piano Playing with Deep Reinforcement LearningCode2
Universal Guidance for Diffusion ModelsCode2
Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image SynthesisCode2
GenerSpeech: Towards Style Transfer for Generalizable Out-Of-Domain Text-to-SpeechCode2
Investigating the Role of Image Retrieval for Visual Localization -- An exhaustive benchmarkCode2
Piloting Structure-Based Drug Design via Modality-Specific Optimal ScheduleCode2
Autonomous Catheterization with Open-source Simulator and Expert TrajectoryCode2
Data-Centric Foundation Models in Computational Healthcare: A SurveyCode2
BAPLe: Backdoor Attacks on Medical Foundational Models using Prompt LearningCode2
LRM-Zero: Training Large Reconstruction Models with Synthesized DataCode2
DiffuSeq-v2: Bridging Discrete and Continuous Text Spaces for Accelerated Seq2Seq Diffusion ModelsCode2
Regional Tiny Stories: Using Small Models to Compare Language Learning and Tokenizer PerformanceCode2
UVEB: A Large-scale Benchmark and Baseline Towards Real-World Underwater Video EnhancementCode2
TRACE: Temporal Grounding Video LLM via Causal Event ModelingCode2
DiffIR2VR-Zero: Zero-Shot Video Restoration with Diffusion-based Image Restoration ModelsCode2
ScatterFormer: Efficient Voxel Transformer with Scattered Linear AttentionCode2
A Survey on Large Language Models for Code GenerationCode2
Rethinking Optimization and Architecture for Tiny Language ModelsCode2
FusionMamba: Efficient Remote Sensing Image Fusion with State Space ModelCode2
Most Language Models can be Poets too: An AI Writing Assistant and Constrained Text Generation StudioCode2
VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal ModelsCode2
THEMIS: Towards Practical Intellectual Property Protection for Post-Deployment On-Device Deep Learning ModelsCode2
Show:102550
← PrevPage 203 of 13232Next →