SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 74017425 of 474278 papers

TitleStatusHype
Prompt Triage: Structured Optimization Enhances Vision-Language Model Performance on Medical Imaging BenchmarksCode0
Latent Motion Profiling for Annotation-free Cardiac Phase Detection in Adult and Fetal Echocardiography VideosCode0
ICL-Router: In-Context Learned Model Representations for LLM RoutingCode0
Hierarchical Mixing Architecture for Low-light RAW Image EnhancementCode0
A Closer Look at Knowledge Distillation in Spiking Neural Network TrainingCode0
FreDFT: Frequency Domain Fusion Transformer for Visible-Infrared Object DetectionCode0
From Proof to Program: Characterizing Tool-Induced Reasoning Hallucinations in Large Language ModelsCode0
Exposing Weak Links in Multi-Agent Systems under Adversarial PromptingCode0
Multi-agent Undercover Gaming: Hallucination Removal via Counterfactual Test for Multimodal ReasoningCode0
Q-Doc: Benchmarking Document Image Quality Assessment Capabilities in Multi-modal Large Language ModelsCode0
VoxTell: Free-Text Promptable Universal 3D Medical Image SegmentationCode0
Towards Mitigating Systematics in Large-Scale Surveys via Few-Shot Optimal Transport-Based Feature AlignmentCode0
TopoPerception: A Shortcut-Free Evaluation of Global Visual Perception in Large Vision-Language ModelsCode0
PI-NAIM: Path-Integrated Neural Adaptive Imputation ModelCode0
Multi-agent In-context Coordination via Decentralized Memory RetrievalCode0
Beyond Perplexity: Let the Reader Select Retrieval Summaries via Spectrum Projection Score0
Test-Time Reinforcement Learning for GUI Grounding via Region Consistency0
MVU-Eval: Towards Multi-Video Understanding Evaluation for Multimodal LLMs0
PROPA: Toward Process-level Optimization in Visual Reasoning via Reinforcement LearningCode0
SIMS-V: Simulated Instruction-Tuning for Spatial Video Understanding0
URaG: Unified Retrieval and Generation in Multimodal LLMs for Efficient Long Document UnderstandingCode0
Depth Anything 3: Recovering the Visual Space from Any Views0
SPOT: Sparsification with Attention Dynamics via Token Relevance in Vision TransformersCode0
BanglaTalk: Towards Real-Time Speech Assistance for Bengali Regional DialectsCode0
Retrieval-Augmented Generation for Reliable Interpretation of Radio RegulationsCode0
Show:102550
← PrevPage 297 of 18972Next →