SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 58015850 of 661570 papers

TitleStatusHype
POLCA: Stochastic Generative Optimization with LLMCode0
SpiralDiff: Spiral Diffusion with LoRA for RGB-to-RAW Conversion Across CamerasCode0
Empowering Chemical Structures with Biological Insights for Scalable Phenotypic Virtual ScreeningCode0
PiGRAND: Physics-informed Graph Neural Diffusion for Intelligent Additive ManufacturingCode0
Invisible failures in human-AI interactionsCode0
ViFeEdit: A Video-Free Tuner of Your Video Diffusion TransformerCode0
SlovKE: A Large-Scale Dataset and LLM Evaluation for Slovak Keyphrase ExtractionCode0
InterveneBench: Benchmarking LLMs for Intervention Reasoning and Causal Study Design in Real Social SystemsCode0
CardioComposer: Leveraging Differentiable Geometry for Compositional Control of Anatomical Diffusion ModelsCode0
Beyond the Embedding Bottleneck: Adaptive Retrieval-Augmented 3D CT Report GenerationCode0
W2T: LoRA Weights Already Know What They Can DoCode0
Vietnamese Automatic Speech Recognition: A RevisitCode0
Impatient Users Confuse AI Agents: High-fidelity Simulations of Human Traits for Testing AgentsCode0
SciPostLayoutTree: A Dataset for Structural Analysis of Scientific PostersCode0
Imagine-then-Plan: Agent Learning from Adaptive Lookahead with World ModelsCode0
Surprised by Attention: Predictable Query Dynamics for Time Series Anomaly DetectionCode0
M2IR: Proactive All-in-One Image Restoration via Mamba-style Modulation and Mixture-of-ExpertsCode0
TopoVST: Toward Topology-fidelitous Vessel Skeleton TrackingCode0
MER-Bench: A Comprehensive Benchmark for Multimodal Meme ReappraisalCode0
Mastering the Minority: An Uncertainty-guided Multi-Expert Framework for Challenging-tailed Sequence LearningCode0
VideoChat-A1: Thinking with Long Videos by Chain-of-Shot ReasoningCode0
Rationale-Enhanced Decoding for Multi-modal Chain-of-ThoughtCode0
FingerTip 20K: A Benchmark for Proactive and Personalized Mobile LLM AgentsCode0
AutoEP: LLMs-Driven Automation of Hyperparameter Evolution for Metaheuristic AlgorithmsCode0
Overthinking Reduction with Decoupled Rewards and Curriculum Data SchedulingCode0
Semantic Context Matters: Improving Conditioning for Autoregressive ModelsCode0
Echo-CoPilot: A Multiple-Perspective Agentic Framework for Reliable Echocardiography InterpretationCode0
SoliReward: Mitigating Susceptibility to Reward Hacking and Annotation Noise in Video Generation Reward ModelsCode0
Cross-RAG: Zero-Shot Retrieval-Augmented Time Series Forecasting via Cross-AttentionCode0
HiMemVLN: Enhancing Reliability of Open-Source Zero-Shot Vision-and-Language Navigation with Hierarchical Memory SystemCode0
WiT: Waypoint Diffusion Transformers via Trajectory Conflict NavigationCode0
TextOVSR: Text-Guided Real-World Opera Video Super-ResolutionCode0
Dataset Diversity Metrics and Impact on Classification ModelsCode0
Flash-Unified: A Training-Free and Task-Aware Acceleration Framework for Native Unified ModelsCode0
IRIS: Intersection-aware Ray-based Implicit Editable ScenesCode0
GradCFA: A Hybrid Gradient-Based Counterfactual and Feature Attribution Explanation Algorithm for Local Interpretation of Neural NetworksCode0
When Does Sparsity Mitigate the Curse of Depth in LLMsCode0
Unlocking the Value of Text: Event-Driven Reasoning and Multi-Level Alignment for Time Series ForecastingCode0
Seeing Beyond: Extrapolative Domain Adaptive Panoramic SegmentationCode0
Mixture-of-Depths AttentionCode0
Hilbert: Recursively Building Formal Proofs with Informal ReasoningCode0
Flow Matching for Tabular Data SynthesisCode0
Learning complete and explainable visual representations from itemized text supervisionCode0
CRIMSON: A Clinically-Grounded LLM-Based Metric for Generative Radiology Report EvaluationCode0
The Agentic Researcher: A Practical Guide to AI-Assisted Research in Mathematics and Machine LearningCode0
RealVLG-R1: A Large-Scale Real-World Visual-Language Grounding Benchmark for Robotic Perception and ManipulationCode0
Real-Time Oriented Object Detection Transformer in Remote Sensing ImagesCode0
CoD: A Diffusion Foundation Model for Image CompressionCode0
IgPose: A Generative Data-Augmented Pipeline for Robust Immunoglobulin-Antigen Binding PredictionCode0
Curriculum Reinforcement Learning from Easy to Hard Tasks Improves LLM ReasoningCode0
Show:102550
← PrevPage 117 of 13232Next →