SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 1465114700 of 474278 papers

TitleStatusHype
A Large-Scale Real-World Evaluation of LLM-Based Virtual Teaching AssistantCode1
On Training-Test (Mis)alignment in Unsupervised Combinatorial Optimization: Observation, Empirical Exploration, and AnalysisCode0
Generalizable Agent Modeling for Agent Collaboration-Competition Adaptation with Multi-Retrieval and Dynamic GenerationCode0
RGBTrack: Fast, Robust Depth-Free 6D Pose Estimation and TrackingCode2
From Generality to Mastery: Composer-Style Symbolic Music Generation via Large-Scale Pre-trainingCode0
Breaking the Transcription Bottleneck: Fine-tuning ASR Models for Extremely Low-Resource Fieldwork Languages0
Off-Policy Actor-Critic for Adversarial Observation Robustness: Virtual Alternative Training via Symmetric Policy EvaluationCode0
SAFEx: Analyzing Vulnerabilities of MoE-Based LLMs via Stable Safety-critical Expert Identification0
Compliant Residual DAgger: Improving Real-World Contact-Rich Manipulation with Human Corrections0
Dissecting the SWE-Bench Leaderboards: Profiling Submitters and Architectures of LLM- and Agent-Based Repair Systems0
The Hitchhiker's Guide to Efficient, End-to-End, and Tight DP Auditing0
Dex1B: Learning with 1B Demonstrations for Dexterous Manipulation0
CUBA: Controlled Untargeted Backdoor Attack against Deep Neural Networks0
Monocular One-Shot Metric-Depth Alignment for RGB-Based Robot Grasping0
AnyTraverse: An off-road traversability framework with VLM and human operator in the loop0
Automatic Large Language Models Creation of Interactive Learning Lessons0
DreamCube: 3D Panorama Generation via Multi-plane Synchronization0
Towards Advanced Mathematical Reasoning for LLMs via First-Order Logic Theorem Proving0
The Hidden Cost of an Image: Quantifying the Energy Consumption of AI Image Generation0
Part^2GS: Part-aware Modeling of Articulated Objects using 3D Gaussian Splatting0
Cross-Modal Epileptic Signal Harmonization: Frequency Domain Mapping Quantization for Pre-training a Unified Neurophysiological TransformerCode0
Episode-specific Fine-tuning for Metric-based Few-shot Learners with Optimization-based TrainingCode0
AI's Blind Spots: Geographic Knowledge and Diversity Deficit in Generated Urban Scenario0
Re-Evaluating Code LLM Benchmarks Under Semantic Mutation0
VeriLocc: End-to-End Cross-Architecture Register Allocation via LLM0
LLM-Generated Feedback Supports Learning If Learners Choose to Use ItCode0
TextBraTS: Text-Guided Volumetric Brain Tumor Segmentation with Innovative Dataset Development and Fusion Module ExplorationCode1
A Comparative Analysis of Principal Component Analysis (PCA) and Singular Value Decomposition (SVD) as Dimensionality Reduction Techniques0
Prmpt2Adpt: Prompt-Based Zero-Shot Domain Adaptation for Resource-Constrained Environments0
UniFork: Exploring Modality Alignment for Unified Multimodal Understanding and GenerationCode1
Hunyuan-GameCraft: High-dynamic Interactive Game Video Generation with Hybrid History Condition0
A Simple Contrastive Framework Of Item Tokenization For Generative Recommendation0
Cross-Modal Obfuscation for Jailbreak Attacks on Large Vision-Language Models0
Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual TokensCode3
Sparse-Reg: Improving Sample Complexity in Offline Reinforcement Learning using SparsityCode0
Cache Me If You Can: How Many KVs Do You Need for Effective Long-Context LMs?Code1
Universal Music Representations? Evaluating Foundation Models on World Music CorporaCode0
Adaptive Control Attention Network for Underwater Acoustic Localization and Domain Adaptation0
Visual-Instructed Degradation Diffusion for All-in-One Image RestorationCode1
TeXpert: A Multi-Level Benchmark for Evaluating LaTeX Code Generation by LLMsCode1
VLN-R1: Vision-Language Navigation via Reinforcement Fine-TuningCode4
A Neural Operator based Hybrid Microscale Model for Multiscale Simulation of Rate-Dependent MaterialsCode0
No Free Lunch: Rethinking Internal Feedback for LLM Reasoning0
TabArena: A Living Benchmark for Machine Learning on Tabular DataCode3
SMART HEALTHCARE PREDICTION MANAGEMENT SYSTEM PROJECT.0
Explainable Rule Application via Structured Prompting: A Neural-Symbolic Approach0
One Period to Rule Them All: Identifying Critical Learning Periods in Deep NetworksCode0
GRPO-CARE: Consistency-Aware Reinforcement Learning for Multimodal Reasoning0
One Sample is Enough to Make Conformal Prediction Robust0
R3eVision: A Survey on Robust Rendering, Restoration, and Enhancement for 3D Low-Level VisionCode1
Show:102550
← PrevPage 294 of 9486Next →