SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 80518100 of 661570 papers

TitleStatusHype
CoIR: A Comprehensive Benchmark for Code Information Retrieval ModelsCode2
A Unified Framework for 3D Scene UnderstandingCode2
Planetarium: A Rigorous Benchmark for Translating Text to Structured Planning LanguagesCode2
VEGS: View Extrapolation of Urban Scenes in 3D Gaussian Splatting using Learned PriorsCode2
Solving Motion Planning Tasks with a Scalable Generative ModelCode2
Context-Aware Video Instance SegmentationCode2
HiDiff: Hybrid Diffusion Framework for Medical Image SegmentationCode2
Free-SurGS: SfM-Free 3D Gaussian Splatting for Surgical Scene ReconstructionCode2
DisCo-Diff: Enhancing Continuous Diffusion Models with Discrete LatentsCode2
SegVG: Transferring Object Bounding Box to Segmentation for Visual GroundingCode2
CATT: Character-based Arabic Tashkeel TransformerCode2
Explicitly Guided Information Interaction Network for Cross-modal Point Cloud CompletionCode2
Improving Zero-shot Generalization of Learned Prompts via Unsupervised Knowledge DistillationCode2
MHNet: Multi-view High-order Network for Diagnosing Neurodevelopmental Disorders Using Resting-state fMRICode2
A Bounding Box is Worth One Token: Interleaving Layout and Text in a Large Language Model for Document UnderstandingCode2
ScaleDreamer: Scalable Text-to-3D Synthesis with Asynchronous Score DistillationCode2
MG-Verilog: Multi-grained Dataset Towards Enhanced LLM-assisted Verilog GenerationCode2
WildAvatar: Web-scale In-the-wild Video Dataset for 3D Avatar CreationCode2
Hierarchical Temporal Context Learning for Camera-based Semantic Scene CompletionCode2
Rethinking Data Augmentation for Robust LiDAR Semantic Segmentation in Adverse WeatherCode2
BeNeRF: Neural Radiance Fields from a Single Blurry Image and Event StreamCode2
Safety-Driven Deep Reinforcement Learning Framework for Cobots: A Sim2Real ApproachCode2
Label Anything: Multi-Class Few-Shot Semantic Segmentation with Visual PromptsCode2
MeMemo: On-device Retrieval Augmentation for Private and Personalized Text GenerationCode2
VFIMamba: Video Frame Interpolation with State Space ModelsCode2
AXIAL: Attention-based eXplainability for Interpretable Alzheimer's Localized Diagnosis using 2D CNNs on 3D MRI brain scansCode2
GlyphDraw2: Automatic Generation of Complex Glyph Posters with Diffusion Models and Large Language ModelsCode2
Boosting Consistency in Story Visualization with Rich-Contextual Conditional Diffusion ModelsCode2
DiscoveryBench: Towards Data-Driven Discovery with Large Language ModelsCode2
Learning 3D Gaussians for Extremely Sparse-View Cone-Beam CT ReconstructionCode2
DCoM: Active Learning for All LearnersCode2
SOOD++: Leveraging Unlabeled Data to Boost Oriented Object DetectionCode2
MMLongBench-Doc: Benchmarking Long-context Document Understanding with VisualizationsCode2
Robust and Reliable Early-Stage Website Fingerprinting Attacks via Spatial-Temporal Distribution AnalysisCode2
Centerline Boundary Dice Loss for Vascular SegmentationCode2
Summary of a Haystack: A Challenge to Long-Context LLMs and RAG SystemsCode2
GalLoP: Learning Global and Local Prompts for Vision-Language ModelsCode2
IBSEN: Director-Actor Agent Collaboration for Controllable and Interactive Drama Script GenerationCode2
Improving Diffusion Inverse Problem Solving with Decoupled Noise AnnealingCode2
SeFlow: A Self-Supervised Scene Flow Method in Autonomous DrivingCode2
FORA: Fast-Forward Caching in Diffusion Transformer AccelerationCode2
E.T. the Exceptional Trajectories: Text-to-camera-trajectory generation with character awarenessCode2
Equivariant Diffusion PolicyCode2
FairMedFM: Fairness Benchmarking for Medical Imaging Foundation ModelsCode2
AutoFlow: Automated Workflow Generation for Large Language Model AgentsCode2
DiffIR2VR-Zero: Zero-Shot Video Restoration with Diffusion-based Image Restoration ModelsCode2
RegMix: Data Mixture as Regression for Language Model Pre-trainingCode2
We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?Code2
Benchmarking Predictive Coding Networks -- Made SimpleCode2
KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark of Long Context Capable ApproachesCode2
Show:102550
← PrevPage 162 of 13232Next →