SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 55015550 of 661570 papers

TitleStatusHype
Exposing Hidden Biases in Text-to-Image Models via Automated Prompt Search0
Sharing State Between Prompts and Programs1
Token-Level LLM Collaboration via FusionRoute0
Time-Annealed Perturbation Sampling: Diverse Generation for Diffusion Language Models0
On Theoretically-Driven LLM Agents for Multi-Dimensional Discourse Analysis0
Descent-Guided Policy Gradient for Scalable Cooperative Multi-Agent Learning0
MAWARITH: A Dataset and Benchmark for Legal Inheritance Reasoning with LLMs0
Beyond Polarity: Multi-Dimensional LLM Sentiment Signals for WTI Crude Oil Futures Return Prediction0
Overcoming the Modality Gap in Context-Aided Forecasting0
BrainBench: Exposing the Commonsense Reasoning Gap in Large Language Models0
Transition Flow Matching0
Loosely-Structured Software: Engineering Context, Structure, and Evolution Entropy in Runtime-Rewired Multi-Agent Systems0
Tackling Over-smoothing on Hypergraphs: A Ricci Flow-guided Neural Diffusion Approach0
LLM-Driven Discovery of High-Entropy Catalysts via Retrieval-Augmented Generation0
Embedding-Aware Feature Discovery: Bridging Latent Representations and Interpretable Features in Event Sequences0
Meta-TTRL: A Metacognitive Framework for Self-Improving Test-Time Reinforcement Learning in Unified Multimodal Models0
S2Act: Simple Spiking Actor0
ClawWorm: Self-Propagating Attacks Across LLM Agent Ecosystems0
You've Got a Golden Ticket: Improving Generative Robot Policies With A Single Noise Vector0
Simulation Distillation: Pretraining World Models in Simulation for Rapid Real-World Adaptation0
CorrectionPlanner: Self-Correction Planner with Reinforcement Learning in Autonomous Driving0
Domain Adaptation Without the Compute Burden for Efficient Whole Slide Image Analysis0
Parallelised Differentiable Straightest Geodesics for 3D Meshes0
Evolving Contextual Safety in Multi-Modal Large Language Models via Inference-Time Self-Reflective Memory0
Mask Is What DLLM Needs: A Masked Data Training Paradigm for Diffusion LLMs0
Feed-forward Gaussian Registration for Head Avatar Creation and Editing0
ModTrack: Sensor-Agnostic Multi-View Tracking via Identity-Informed PHD Filtering with Covariance Propagation0
Spectral Hierarchy of the Cosmic Web0
When Stability Fails: Hidden Failure Modes Of LLMS in Data-Constrained Scientific Decision-Making0
FlashSampling: Fast and Memory-Efficient Exact SamplingCode0
Interpretative Interfaces: Designing for AI-Mediated Reading Practices and the Knowledge Commons0
Electrodermal Activity as a Unimodal Signal for Aerobic Exercise Detection in Wearable Sensors0
Temporal Fact Conflicts in LLMs: Reproducibility Insights from Unifying DYNAMICQA and MULAN0
COGNAC at SemEval-2026 Task 5: LLM Ensembles for Human-Level Word Sense Plausibility Rating in Challenging Narratives0
Federated Learning for Privacy-Preserving Medical AI0
Agent-based imitation dynamics can yield efficiently compressed population-level vocabularies0
Game-Theory-Assisted Reinforcement Learning for Border Defense: Early Termination based on Analytical Solutions0
Prompt Engineering for Scale Development in Generative Psychometrics0
Auto Researching, not hyperparameter tuning: Convergence Analysis of 10,000 Experiments0
Bayesian-guided inverse design of hyperelastic microstructures: Application to stochastic metamaterials0
Sparse but not Simpler: A Multi-Level Interpretability Analysis of Vision Transformers0
Evaluating Agentic Optimization on Large Codebases0
Generative Inverse Design with Abstention via Diagonal Flow Matching0
Discovery of interaction and diffusion kernels in particle-to-mean-field multi-agent systems0
Nodule-Aligned Latent Space Learning with LLM-Driven Multimodal Diffusion for Lung Nodule Progression Prediction0
Do Not Leave a Gap: Hallucination-Free Object Concealment in Vision-Language Models0
Towards Fair and Robust Volumetric CT Classification via KL-Regularised Group Distributionally Robust Optimisation0
Argumentative Human-AI Decision-Making: Toward AI Agents That Reason With Us, Not For Us0
BANGLASOCIALBENCH: A Benchmark for Evaluating Sociopragmatic and Cultural Alignment of LLMs in Bangladeshi Social Interaction0
Protein Design with Agent Rosetta: A Case Study for Specialized Scientific Agents0
Show:102550
← PrevPage 111 of 13232Next →