SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 74767500 of 474278 papers

TitleStatusHype
Too Good to be Bad: On the Failure of LLMs to Role-Play Villains0
Routing Manifold Alignment Improves Generalization of Mixture-of-Experts LLMs0
Capturing Polysemanticity with PRISM: A Multi-Concept Feature Description Framework0
AgentFlux: Decoupled Fine-Tuning & Inference for On-Device Agentic Systems0
SPEED-Q: Staged Processing with Enhanced Distillation towards Efficient Low-bit On-device VLM QuantizationCode0
Soiling detection for Advanced Driver Assistance SystemsCode0
Neural B-frame Video Compression with Bi-directional Reference HarmonizationCode0
Causally-Grounded Dual-Path Attention Intervention for Object Hallucination Mitigation in LVLMsCode0
MM-CRITIC: A Holistic Evaluation of Large Multimodal Models as Multimodal CritiqueCode0
Iterated Population Based Training with Task-Agnostic RestartsCode0
Augment to Augment: Diverse Augmentations Enable Competitive Ultra-Low-Field MRI EnhancementCode0
Learning by Neighbor-Aware Semantics, Deciding by Open-form Flows: Towards Robust Zero-Shot Skeleton Action RecognitionCode0
FactReasoner: A Probabilistic Approach to Long-Form Factuality Assessment for Large Language ModelsCode0
Abn-BLIP: Abnormality-aligned Bootstrapping Language-Image Pre-training for Pulmonary Embolism Diagnosis and Report Generation from CTPACode0
EcomMMMU: Strategic Utilization of Visuals for Robust Multimodal E-commerce ModelsCode0
PsychCounsel-Bench: Evaluating the Psychology Intelligence of Large Language ModelsCode0
PALMS+: Modular Image-Based Floor Plan Localization Leveraging Depth Foundation ModelCode0
DI3CL: Contrastive Learning With Dynamic Instances and Contour Consistency for SAR Land-Cover Classification Foundation ModelCode0
DG-DETR: Toward Domain Generalized Detection TransformerCode0
evMLP: An Efficient Event-Driven MLP Architecture for VisionCode0
Trustworthy Pedestrian Trajectory Prediction via Pattern-Aware Interaction ModelingCode0
Rethinking Pan-sharpening: A New Training Process for Full-Resolution GeneralizationCode0
Mitigating Hallucinations in Large Language Models via Causal ReasoningCode0
RadHARSimulator V2: Video to Doppler GeneratorCode0
Diversifying Counterattacks: Orthogonal Exploration for Robust CLIP InferenceCode0
Show:102550
← PrevPage 300 of 18972Next →