SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 1810118150 of 474278 papers

TitleStatusHype
Geoff: The Generic Optimization Framework & Frontend for Particle Accelerator Controls0
RSVP: Reasoning Segmentation via Visual Prompting and Multi-modal Chain-of-Thought0
Evaluating MLLMs with Multimodal Multi-image Reasoning Benchmark0
Is Perturbation-Based Image Protection Disruptive to Image Editing?0
HMAR: Efficient Hierarchical Masked Auto-Regressive Image Generation0
AuthGuard: Generalizable Deepfake Detection via Language Guidance0
Generating Pedagogically Meaningful Visuals for Math Word Problems: A New Benchmark and Analysis of Text-to-Image ModelsCode1
STAR: Learning Diverse Robot Skill Abstractions through Rotation-Augmented Vector QuantizationCode0
A Retrieval-Augmented Multi-Agent Framework for Psychiatry DiagnosisCode0
Matter-of-Fact: A Benchmark for Verifying the Feasibility of Literature-Supported Claims in Materials ScienceCode0
HTSC-2025: A Benchmark Dataset of Ambient-Pressure High-Temperature Superconductors for AI-Driven Critical Temperature PredictionCode0
Curse of Slicing: Why Sliced Mutual Information is a Deceptive Measure of Statistical Dependence0
Fine-Tuning Video Transformers for Word-Level Bangla Sign Language: A Comparative Analysis for Classification Tasks0
AmbiK: Dataset of Ambiguous Tasks in Kitchen EnvironmentCode0
How Far Are We from Predicting Missing Modalities with Foundation Models?Code0
A Kernel-Based Approach for Accurate Steady-State Detection in Performance Time SeriesCode0
Savage-Dickey density ratio estimation with normalizing flows for Bayesian model comparisonCode2
Seed-Coder: Let the Code Model Curate Data for ItselfCode4
Probabilistic measures afford fair comparisons of AIWP and NWP model outputCode0
CHIME: Conditional Hallucination and Integrated Multi-scale Enhancement for Time Series Diffusion Model0
Learning-at-Criticality in Large Language Models for Quantum Field Theory and Beyond0
Fast Sampling for System Identification: Overcoming Noise, Offsets, and Closed-Loop Challenges with State Variable Filter0
Confidence-Guided Human-AI Collaboration: Reinforcement Learning with Distributional Proxy Value Propagation for Autonomous DrivingCode0
BEAR: BGP Event Analysis and ReportingCode0
Leveraging Reward Models for Guiding Code Review Comment GenerationCode0
Dreaming up scale invariance via inverse renormalization groupCode0
Visualizing and Controlling Cortical Responses Using Voxel-Weighted Activation MaximizationCode0
Towards Large-Scale Pose-Invariant Face Recognition Using Face Defrontalization0
WorldPrediction: A Benchmark for High-level World Modeling and Long-horizon Procedural Planning0
WANDER: An Explainable Decision-Support Framework for HPC0
cuVSLAM: CUDA accelerated visual odometry and mapping0
RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics0
Temporal horizons in forecasting: a performance-learnability trade-off0
Can we reconstruct a dysarthric voice with the large speech model Parler TTS?0
LaF-GRPO: In-Situ Navigation Instruction Generation for the Visually Impaired via GRPO with LLM-as-Follower RewardCode0
OWMM-Agent: Open World Mobile Manipulation With Multi-modal Agentic Data SynthesisCode1
FALO: Fast and Accurate LiDAR 3D Object Detection on Resource-Constrained Devices0
Domain Adaptation Method and Modality Gap Impact in Audio-Text Models for Prototypical Sound ClassificationCode0
AssetOpsBench: Benchmarking AI Agents for Task Automation in Industrial Asset Operations and MaintenanceCode5
SGN-CIRL: Scene Graph-based Navigation with Curriculum, Imitation, and Reinforcement LearningCode0
Zero-Shot Temporal Interaction Localization for Egocentric VideosCode1
Graph Counselor: Adaptive Graph Exploration via Multi-Agent Synergy to Enhance LLM ReasoningCode1
Adapt before Continual LearningCode0
Do Large Language Models Know Folktales? A Case Study of Yokai in Japanese Folktales0
SSIMBaD: Sigma Scaling with SSIM-Guided Balanced Diffusion for AnimeFace ColorizationCode0
FPGA-Enabled Machine Learning Applications in Earth Observation: A Systematic ReviewCode0
GARG-AML against Smurfing: A Scalable and Interpretable Graph-Based Framework for Anti-Money LaunderingCode0
N^2: A Unified Python Package and Test Bench for Nearest Neighbor-Based Matrix Completion0
Through the Stealth Lens: Rethinking Attacks and Defenses in RAGCode0
Magic Mushroom: A Customizable Benchmark for Fine-grained Analysis of Retrieval Noise Erosion in RAG Systems0
Show:102550
← PrevPage 363 of 9486Next →