SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 2070120750 of 474278 papers

TitleStatusHype
Graph Wave NetworksCode0
Divide and Conquer: Grounding LLMs as Efficient Decision-Making Agents via Offline Hierarchical Reinforcement LearningCode2
MineAnyBuild: Benchmarking Spatial Planning for Open-world AI AgentsCode1
Learning for Dynamic Combinatorial Optimization without Training Data0
Balancing Interference and Correlation in Spatial Experimental Designs: A Causal Graph Cut ApproachCode0
LPCM: Learning-based Predictive Coding for LiDAR Point Cloud Compression0
AgentRecBench: Benchmarking LLM Agent-based Personalized Recommender Systems0
MoESD: Unveil Speculative Decoding's Potential for Accelerating Sparse MoE0
DeepInverse: A Python package for solving imaging inverse problems with deep learningCode4
Alita: Generalist Agent Enabling Scalable Agentic Reasoning with Minimal Predefinition and Maximal Self-EvolutionCode4
TabPFN: One Model to Rule Them All?Code0
Unifying Multimodal Large Language Model Capabilities and Modalities via Model MergingCode1
LAPA-based Dynamic Privacy Optimization for Wireless Federated Learning in Heterogeneous Environments0
Learning to Select In-Context Demonstration Preferred by Large Language Model0
Regret Analysis of Average-Reward Unichain MDPs via an Actor-Critic Approach0
Agentic AI Process Observability: Discovering Behavioral Variability0
MMPerspective: Do MLLMs Understand Perspective? A Comprehensive Benchmark for Perspective Perception, Reasoning, and RobustnessCode1
EgoZero: Robot Learning from Smart Glasses0
Minimax Adaptive Online Nonparametric Regression over Besov Spaces0
Information-theoretic Generalization Analysis for VQ-VAEs: A Role of Latent Variables0
Ten Principles of AI Agent Economics0
Inverse Q-Learning Done Right: Offline Imitation Learning in Q^π-Realizable MDPsCode0
The Study of Human Preference Based on Integrated Analysis of N1 and LPP Components0
Done Is Better than Perfect: Unlocking Efficient Reasoning by Structured Multi-Turn Decomposition0
Unveiling the Compositional Ability Gap in Vision-Language Reasoning ModelCode0
Style2Code: A Style-Controllable Code Generation Framework with Dual-Modal Contrastive Representation LearningCode0
Parameter-Efficient Fine-Tuning with Column Space Projection0
Cellwise and Casewise Robust Covariance in High Dimensions0
Future Link Prediction Without Memory or AggregationCode0
Lego Sketch: A Scalable Memory-augmented Neural Network for Sketching Data StreamsCode0
Beyond Safe Answers: A Benchmark for Evaluating True Risk Awareness in Large Reasoning ModelsCode0
Lorentz Local Canonicalization: How to Make Any Network Lorentz-Equivariant0
Measure Domain's Gap: A Similar Domain Selection Principle for Multi-Domain RecommendationCode0
Genome-Bench: A Scientific Reasoning Benchmark from Real-World Expert Discussions0
Machine Learning Algorithm for Noise Reduction and Disease-Causing Gene Feature Extraction in Gene Sequencing Data0
Density Ratio-Free Doubly Robust Proxy Causal Learning0
Inceptive Transformers: Enhancing Contextual Representations through Multi-Scale Feature Learning Across Domains and Languages0
AMQA: An Adversarial Dataset for Benchmarking Bias of LLMs in Medicine and HealthcareCode0
syftr: Pareto-Optimal Generative AICode3
Benchmarking Multimodal Knowledge Conflict for Large Multimodal ModelsCode1
Correlating instruction-tuning (in multimodal models) with vision-language processing (in the brain)Code0
Beyond Freezing: Sparse Tuning Enhances Plasticity in Continual Learning with Pre-Trained ModelsCode0
HIT Model: A Hierarchical Interaction-Enhanced Two-Tower Model for Pre-Ranking Systems0
Unlocking the Power of Diffusion Models in Sequential Recommendation: A Simple and Effective ApproachCode1
PCDCNet: A Surrogate Model for Air Quality Forecasting with Physical-Chemical Dynamics and ConstraintsCode3
A Semantic Change Detection Network Based on Boundary Detection and Task Interaction for High-Resolution Remote Sensing ImagesCode1
KnowTrace: Bootstrapping Iterative Retrieval-Augmented Generation with Structured Knowledge TracingCode1
LangDAug: Langevin Data Augmentation for Multi-Source Domain Generalization in Medical Image SegmentationCode1
Large Language Models Meet Knowledge Graphs for Question Answering: Synthesis and OpportunitiesCode2
HAODiff: Human-Aware One-Step Diffusion via Dual-Prompt GuidanceCode1
Show:102550
← PrevPage 415 of 9486Next →