The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 5501–5550 of 661570 papers

Title	Date	Status	Hype
Exposing Hidden Biases in Text-to-Image Models via Automated Prompt Search	Mar 16, 2026	—Unverified	0
Sharing State Between Prompts and Programs	Mar 16, 2026	—Unverified	1
Token-Level LLM Collaboration via FusionRoute	Mar 16, 2026	—Unverified	0
Time-Annealed Perturbation Sampling: Diverse Generation for Diffusion Language Models	Mar 16, 2026	—Unverified	0
On Theoretically-Driven LLM Agents for Multi-Dimensional Discourse Analysis	Mar 16, 2026	—Unverified	0
Descent-Guided Policy Gradient for Scalable Cooperative Multi-Agent Learning	Mar 16, 2026	—Unverified	0
MAWARITH: A Dataset and Benchmark for Legal Inheritance Reasoning with LLMs	Mar 16, 2026	—Unverified	0
Beyond Polarity: Multi-Dimensional LLM Sentiment Signals for WTI Crude Oil Futures Return Prediction	Mar 16, 2026	—Unverified	0
Overcoming the Modality Gap in Context-Aided Forecasting	Mar 16, 2026	—Unverified	0
BrainBench: Exposing the Commonsense Reasoning Gap in Large Language Models	Mar 16, 2026	—Unverified	0
Transition Flow Matching	Mar 16, 2026	—Unverified	0
Loosely-Structured Software: Engineering Context, Structure, and Evolution Entropy in Runtime-Rewired Multi-Agent Systems	Mar 16, 2026	—Unverified	0
Tackling Over-smoothing on Hypergraphs: A Ricci Flow-guided Neural Diffusion Approach	Mar 16, 2026	—Unverified	0
LLM-Driven Discovery of High-Entropy Catalysts via Retrieval-Augmented Generation	Mar 16, 2026	—Unverified	0
Embedding-Aware Feature Discovery: Bridging Latent Representations and Interpretable Features in Event Sequences	Mar 16, 2026	—Unverified	0
Meta-TTRL: A Metacognitive Framework for Self-Improving Test-Time Reinforcement Learning in Unified Multimodal Models	Mar 16, 2026	—Unverified	0
S2Act: Simple Spiking Actor	Mar 16, 2026	—Unverified	0
ClawWorm: Self-Propagating Attacks Across LLM Agent Ecosystems	Mar 16, 2026	—Unverified	0
You've Got a Golden Ticket: Improving Generative Robot Policies With A Single Noise Vector	Mar 16, 2026	—Unverified	0
Simulation Distillation: Pretraining World Models in Simulation for Rapid Real-World Adaptation	Mar 16, 2026	—Unverified	0
CorrectionPlanner: Self-Correction Planner with Reinforcement Learning in Autonomous Driving	Mar 16, 2026	—Unverified	0
Domain Adaptation Without the Compute Burden for Efficient Whole Slide Image Analysis	Mar 16, 2026	—Unverified	0
Parallelised Differentiable Straightest Geodesics for 3D Meshes	Mar 16, 2026	—Unverified	0
Evolving Contextual Safety in Multi-Modal Large Language Models via Inference-Time Self-Reflective Memory	Mar 16, 2026	—Unverified	0
Mask Is What DLLM Needs: A Masked Data Training Paradigm for Diffusion LLMs	Mar 16, 2026	—Unverified	0
Feed-forward Gaussian Registration for Head Avatar Creation and Editing	Mar 16, 2026	—Unverified	0
ModTrack: Sensor-Agnostic Multi-View Tracking via Identity-Informed PHD Filtering with Covariance Propagation	Mar 16, 2026	—Unverified	0
Spectral Hierarchy of the Cosmic Web	Mar 16, 2026	—Unverified	0
When Stability Fails: Hidden Failure Modes Of LLMS in Data-Constrained Scientific Decision-Making	Mar 16, 2026	—Unverified	0
FlashSampling: Fast and Memory-Efficient Exact Sampling	Mar 16, 2026	CodeCode Available	0
Interpretative Interfaces: Designing for AI-Mediated Reading Practices and the Knowledge Commons	Mar 16, 2026	—Unverified	0
Electrodermal Activity as a Unimodal Signal for Aerobic Exercise Detection in Wearable Sensors	Mar 16, 2026	—Unverified	0
Temporal Fact Conflicts in LLMs: Reproducibility Insights from Unifying DYNAMICQA and MULAN	Mar 16, 2026	—Unverified	0
COGNAC at SemEval-2026 Task 5: LLM Ensembles for Human-Level Word Sense Plausibility Rating in Challenging Narratives	Mar 16, 2026	—Unverified	0
Federated Learning for Privacy-Preserving Medical AI	Mar 16, 2026	—Unverified	0
Agent-based imitation dynamics can yield efficiently compressed population-level vocabularies	Mar 16, 2026	—Unverified	0
Game-Theory-Assisted Reinforcement Learning for Border Defense: Early Termination based on Analytical Solutions	Mar 16, 2026	—Unverified	0
Prompt Engineering for Scale Development in Generative Psychometrics	Mar 16, 2026	—Unverified	0
Auto Researching, not hyperparameter tuning: Convergence Analysis of 10,000 Experiments	Mar 16, 2026	—Unverified	0
Bayesian-guided inverse design of hyperelastic microstructures: Application to stochastic metamaterials	Mar 16, 2026	—Unverified	0
Sparse but not Simpler: A Multi-Level Interpretability Analysis of Vision Transformers	Mar 16, 2026	—Unverified	0
Evaluating Agentic Optimization on Large Codebases	Mar 16, 2026	—Unverified	0
Generative Inverse Design with Abstention via Diagonal Flow Matching	Mar 16, 2026	—Unverified	0
Discovery of interaction and diffusion kernels in particle-to-mean-field multi-agent systems	Mar 16, 2026	—Unverified	0
Nodule-Aligned Latent Space Learning with LLM-Driven Multimodal Diffusion for Lung Nodule Progression Prediction	Mar 16, 2026	—Unverified	0
Do Not Leave a Gap: Hallucination-Free Object Concealment in Vision-Language Models	Mar 16, 2026	—Unverified	0
Towards Fair and Robust Volumetric CT Classification via KL-Regularised Group Distributionally Robust Optimisation	Mar 16, 2026	—Unverified	0
Argumentative Human-AI Decision-Making: Toward AI Agents That Reason With Us, Not For Us	Mar 16, 2026	—Unverified	0
BANGLASOCIALBENCH: A Benchmark for Evaluating Sociopragmatic and Cultural Alignment of LLMs in Bangladeshi Social Interaction	Mar 16, 2026	—Unverified	0
Protein Design with Agent Rosetta: A Case Study for Specialized Scientific Agents	Mar 16, 2026	—Unverified	0