SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 1555115600 of 474278 papers

TitleStatusHype
Large Language Models as Autonomous Spacecraft Operators in Kerbal Space ProgramCode1
ReasonPlan: Unified Scene Prediction and Decision Reasoning for Closed-loop Autonomous DrivingCode1
Can Compressed LLMs Truly Act? An Empirical Evaluation of Agentic Capabilities in LLM CompressionCode1
LangDAug: Langevin Data Augmentation for Multi-Source Domain Generalization in Medical Image SegmentationCode1
Concise Reasoning, Big Gains: Pruning Long Reasoning Trace with Difficulty-Aware PromptingCode1
Visual Abstract Thinking Empowers Multimodal ReasoningCode1
Visualized Text-to-Image RetrievalCode1
TailorKV: A Hybrid Framework for Long-Context Inference via Tailored KV Cache OptimizationCode1
MLR-Bench: Evaluating AI Agents on Open-Ended Machine Learning ResearchCode1
Benchmarking Multimodal Knowledge Conflict for Large Multimodal ModelsCode1
SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering AgentsCode1
Compliance-to-Code: Enhancing Financial Compliance Checking via Code GenerationCode1
DriveCamSim: Generalizable Camera Simulation via Explicit Camera Modeling for Autonomous DrivingCode1
Multimodal LLM-Guided Semantic Correction in Text-to-Image DiffusionCode1
NeuSym-RAG: Hybrid Neural Symbolic Retrieval with Multiview Structuring for PDF Question AnsweringCode1
A Regularization-Guided Equivariant Approach for Image RestorationCode1
Towards Video to Piano Music Generation with Chain-of-Perform Support BenchmarksCode1
MiniLongBench: The Low-cost Long Context Understanding Benchmark for Large Language ModelsCode1
SMART-PC: Skeletal Model Adaptation for Robust Test-Time Training in Point CloudsCode1
Unlocking the Power of Diffusion Models in Sequential Recommendation: A Simple and Effective ApproachCode1
Data-Free Class-Incremental Gesture Recognition with Prototype-Guided Pseudo Feature ReplayCode1
Navigating PESQ: Up-to-Date Versions and Open ImplementationsCode1
Ctrl-DNA: Controllable Cell-Type-Specific Regulatory DNA Design via Constrained RLCode1
A Semantic Change Detection Network Based on Boundary Detection and Task Interaction for High-Resolution Remote Sensing ImagesCode1
Locality-Aware Zero-Shot Human-Object Interaction DetectionCode1
FastCache: Fast Caching for Diffusion Transformer Through Learnable Linear ApproximationCode1
MOLE: Metadata Extraction and Validation in Scientific Papers Using LLMsCode1
The Avengers: A Simple Recipe for Uniting Smaller Language Models to Challenge Proprietary GiantsCode1
From What to How: Attributing CLIP's Latent Components Reveals Unexpected Semantic RelianceCode1
Task Memory Engine: Spatial Memory for Robust Multi-Step LLM AgentsCode1
Rotation-Equivariant Self-Supervised Method in Image DenoisingCode1
Rethinking Text-based Protein Understanding: Retrieval or LLM?Code1
ReaMOT: A Benchmark and Framework for Reasoning-based Multi-Object TrackingCode1
Temporal Sampling for Forgotten Reasoning in LLMsCode1
Exploring Consciousness in LLMs: A Systematic Survey of Theories, Implementations, and Frontier RisksCode1
R3-RAG: Learning Step-by-Step Reasoning and Retrieval for LLMs via Reinforcement LearningCode1
Towards Multi-Granularity Memory Association and Selection for Long-Term Conversational AgentsCode1
Hierarchical Masked Autoregressive Models with Low-Resolution Token PivotsCode1
Mel-McNet: A Mel-Scale Framework for Online Multichannel Speech EnhancementCode1
MangaVQA and MangaLMM: A Benchmark and Specialized Model for Multimodal Manga UnderstandingCode1
OpenNIRScap: An Open-Source, Low-Cost Wearable Near-Infrared Spectroscopy-based Brain Interfacing CapCode1
HAODiff: Human-Aware One-Step Diffusion via Dual-Prompt GuidanceCode1
ReChisel: Effective Automatic Chisel Code Generation by LLM with ReflectionCode1
Modality Curation: Building Universal Embeddings for Advanced Multimodal Information RetrievalCode1
Vad-R1: Towards Video Anomaly Reasoning via Perception-to-Cognition Chain-of-ThoughtCode1
FLAME-MoE: A Transparent End-to-End Research Platform for Mixture-of-Experts Language ModelsCode1
VADER: A Human-Evaluated Benchmark for Vulnerability Assessment, Detection, Explanation, and RemediationCode1
Decoupling Spatio-Temporal Prediction: When Lightweight Large Models Meet Adaptive HypergraphsCode1
MineAnyBuild: Benchmarking Spatial Planning for Open-world AI AgentsCode1
ParticleGS: Particle-Based Dynamics Modeling of 3D Gaussians for Prior-free Motion ExtrapolationCode1
Show:102550
← PrevPage 312 of 9486Next →