SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 1570115750 of 474278 papers

TitleStatusHype
VEAttack: Downstream-agnostic Vision Encoder Attack against Large Vision Language ModelsCode1
Generative Distribution EmbeddingsCode1
HRSim: An agent-based simulation platform for high-capacity ride-sharing servicesCode1
Taming Diffusion for Dataset Distillation with High RepresentativenessCode1
CENet: Context Enhancement Network for Medical Image SegmentationCode1
CausalDynamics: A large-scale benchmark for structural discovery of dynamical causal modelsCode1
Background Matters: A Cross-view Bidirectional Modeling Framework for Semi-supervised Medical Image SegmentationCode1
Deep Learning-Driven Ultra-High-Definition Image Restoration: A SurveyCode1
A Square Peg in a Square Hole: Meta-Expert for Long-Tailed Semi-Supervised LearningCode1
R1-Compress: Long Chain-of-Thought Compression via Chunk Compression and SearchCode1
ChemMLLM: Chemical Multimodal Large Language ModelCode1
Circle-RoPE: Cone-like Decoupled Rotary Positional Embedding for Large Vision-Language ModelsCode1
From Tens of Hours to Tens of Thousands: Scaling Back-Translation for Speech RecognitionCode1
EduBench: A Comprehensive Benchmarking Dataset for Evaluating Large Language Models in Diverse Educational ScenariosCode1
Benchmarking Retrieval-Augmented Multimomal Generation for Document Question AnsweringCode1
PICT -- A Differentiable, GPU-Accelerated Multi-Block PISO Solver for Simulation-Coupled Learning Tasks in Fluid DynamicsCode1
CrossLMM: Decoupling Long Video Sequences from LMMs via Dual Cross-Attention MechanismsCode1
Fact-R1: Towards Explainable Video Misinformation Detection with Deep ReasoningCode1
Efficient Motion Prompt Learning for Robust Visual TrackingCode1
Style Transfer with Diffusion Models for Synthetic-to-Real Domain AdaptationCode1
Think-RM: Enabling Long-Horizon Reasoning in Generative Reward ModelsCode1
MTSA: Multi-turn Safety Alignment for LLMs through Multi-round Red-teamingCode1
RealEngine: Simulating Autonomous Driving in Realistic ContextCode1
Unlearning Isn't Deletion: Investigating Reversibility of Machine Unlearning in LLMsCode1
REOBench: Benchmarking Robustness of Earth Observation Foundation ModelsCode1
Forward-only Diffusion Probabilistic ModelsCode1
Dysfluent WFST: A Framework for Zero-Shot Speech Dysfluency Transcription and DetectionCode1
LLM as Effective Streaming Processor: Bridging Streaming-Batch Mismatches with Group Position EncodingCode1
ExeSQL: Self-Taught Text-to-SQL Models with Execution-Driven Bootstrapping for SQL DialectsCode1
Flow Matching based Sequential Recommender ModelCode1
REPA Works Until It Doesn't: Early-Stopped, Holistic Alignment Supercharges Diffusion TrainingCode1
O^2-Searcher: A Searching-based Agent Model for Open-Domain Open-Ended Question AnsweringCode1
FreshRetailNet-50K: A Stockout-Annotated Censored Demand Dataset for Latent Demand Recovery and Forecasting in Fresh RetailCode1
AGENTIF: Benchmarking Instruction Following of Large Language Models in Agentic ScenariosCode1
DeCafNet: Delegate and Conquer for Efficient Temporal Grounding in Long VideosCode1
R^2ec: Towards Large Recommender Models with ReasoningCode1
Guided Diffusion Sampling on Function Spaces with Applications to PDEsCode1
Sketchy Bounding-box Supervision for 3D Instance SegmentationCode1
JanusDNA: A Powerful Bi-directional Hybrid DNA Foundation ModelCode1
Incorporating Visual Correspondence into Diffusion Model for Virtual Try-OnCode1
Chirp Delay-Doppler Domain Modulation: A New Paradigm of Integrated Sensing and Communication for Autonomous VehiclesCode1
ALTo: Adaptive-Length Tokenizer for Autoregressive Mask GenerationCode1
LINEA: Fast and Accurate Line Detection Using Scalable TransformersCode1
Materials Generation in the Era of Artificial Intelligence: A Comprehensive SurveyCode1
Transformer brain encoders explain human high-level visual responsesCode1
OpenSeg-R: Improving Open-Vocabulary Segmentation via Step-by-Step Visual ReasoningCode1
CoNav: Collaborative Cross-Modal Reasoning for Embodied NavigationCode1
OSCAR: One-Step Diffusion Codec for Image Compression Across Multiple Bit-ratesCode1
FoMoH: A clinically meaningful foundation model evaluation for structured electronic health recordsCode1
Mitigating Hallucinations in Vision-Language Models through Image-Guided Head SuppressionCode1
Show:102550
← PrevPage 315 of 9486Next →