The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 16451–16500 of 474278 papers

Title	Date	Tasks	Status	Hype
GenBreak: Red Teaming Text-to-Image Generators Using Large Language Models	Jun 11, 2025	Large Language ModelRed Teaming	—Unverified	0
Disclosure Audits for LLM Agents	Jun 11, 2025	DiagnosticLanguage Modeling	—Unverified	0
GRAIL: A Benchmark for GRaph ActIve Learning in Dynamic Sensing Environments	Jun 11, 2025	Active LearningBenchmarking	—Unverified	0
DynaSubVAE: Adaptive Subgrouping for Scalable and Robust OOD Detection	Jun 11, 2025	ClusteringRepresentation Learning	—Unverified	0
AtmosMJ: Revisiting Gating Mechanism for AI Weather Forecasting Beyond the Year Scale	Jun 11, 2025	GPUWeather Forecasting	CodeCode Available	0
TaskCraft: Automated Generation of Agentic Tasks	Jun 11, 2025		CodeCode Available	2
Learning to Collaborate Over Graphs: A Selective Federated Multi-Task Learning Approach	Jun 11, 2025	Community DetectionFairness	CodeCode Available	0
The 2025 PNPL Competition: Speech Detection and Phoneme Classification in the LibriBrain Dataset	Jun 11, 2025	Brain Computer Interface	—Unverified	0
Data-Driven Modeling of IRCU Patient Flow in the COVID-19 Pandemic	Jun 11, 2025	Respiratory Failure	CodeCode Available	0
TransXSSM: A Hybrid Transformer State Space Model with Unified Rotary Position Embedding	Jun 11, 2025	4kLanguage Modeling	—Unverified	0
NnD: Diffusion-based Generation of Physically-Nonnegative Objects	Jun 11, 2025	Scene Generation	—Unverified	0
Textual Bayes: Quantifying Uncertainty in LLM-Based Systems	Jun 11, 2025	Bayesian InferencePrompt Engineering	—Unverified	0
What is the Cost of Differential Privacy for Deep Learning-Based Trajectory Generation?	Jun 11, 2025		CodeCode Available	0
Chat-of-Thought: Collaborative Multi-Agent System for Generating Domain Specific Information	Jun 11, 2025	Language ModelingLanguage Modelling	—Unverified	0
3D-Aware Vision-Language Models Fine-Tuning with Geometric Distillation	Jun 11, 2025	Spatial Reasoning	CodeCode Available	1
Revisit What You See: Disclose Language Prior in Vision Tokens for Efficient Guided Decoding of LVLMs	Jun 11, 2025	HallucinationObject Hallucination	CodeCode Available	1
Q2E: Query-to-Event Decomposition for Zero-Shot Multilingual Text-to-Video Retrieval	Jun 11, 2025	RetrievalText to Video Retrieval	—Unverified	0
Efficient kernelized bandit algorithms via exploration distributions	Jun 11, 2025	Thompson Sampling	—Unverified	0
Interpreting learned search: finding a transition model and value function in an RNN that plays Sokoban	Jun 11, 2025	Sokoban	CodeCode Available	1
A Call for Collaborative Intelligence: Why Human-Agent Systems Should Precede AI Autonomy	Jun 11, 2025		CodeCode Available	2
A quantum semantic framework for natural language processing	Jun 11, 2025		CodeCode Available	5
Omni-DPO: A Dual-Perspective Paradigm for Dynamic Preference Learning of LLMs	Jun 11, 2025	Mathematical Reasoning	CodeCode Available	0
Exposure-slot: Exposure-centric representations learning with Slot-in-Slot Attention for Region-aware Exposure Correction	Jun 11, 2025	Exposure CorrectionImage Enhancement	CodeCode Available	1
VersaVid-R1: A Versatile Video Understanding and Reasoning Model from Question Answering to Captioning Tasks	Jun 10, 2025	Multiple-choiceOpen-Ended Question Answering	—Unverified	0
Colors See Colors Ignore: Clothes Changing ReID with Color Disentanglement (ICCV-25 🥳)	Jun 10, 2025	DisentanglementPerson Re-Identification	—Unverified	0
Improving LLM Agent Planning with In-Context Learning via Atomic Fact Augmentation and Lookahead Search	Jun 10, 2025	In-Context Learning	—Unverified	0
GUIRoboTron-Speech: Towards Automated GUI Agents Based on Speech Instructions	Jun 10, 2025	text-to-speechText to Speech	CodeCode Available	1
ContextLoss: Context Information for Topology-Preserving Segmentation	Jun 10, 2025	Image SegmentationSemantic Segmentation	—Unverified	0
Sparse Autoencoders Bridge The Deep Learning Model and The Brain	Jun 10, 2025	Deep Learning	—Unverified	0
Grids Often Outperform Implicit Neural Representations	Jun 10, 2025	DenoisingSuper-Resolution	CodeCode Available	0
GPU-accelerated Modeling of Biological Regulatory Networks	Jun 10, 2025	CPUglobal-optimization	—Unverified	0
JAFAR: Jack up Any Feature at Any Resolution	Jun 10, 2025	Feature Upsampling	CodeCode Available	3
Technical Report for Argoverse2 Scenario Mining Challenges on Iterative Error Correction and Spatially-Aware Prompting	Jun 10, 2025	Autonomous DrivingCode Generation	—Unverified	0
Optimal Operating Strategy for PV-BESS Households: Balancing Self-Consumption and Self-Sufficiency	Jun 10, 2025	Model Predictive ControlReinforcement Learning (RL)	—Unverified	0
Navigating High-Dimensional Backstage: A Guide for Exploring Literature for the Reliable Use of Dimensionality Reduction	Jun 10, 2025	Dimensionality ReductionDiversity	—Unverified	0
Cross-Frame Representation Alignment for Fine-Tuning Video Diffusion Models	Jun 10, 2025		—Unverified	0
A Multi-Modal Spatial Risk Framework for EV Charging Infrastructure Using Remote Sensing	Jun 10, 2025	Spatial Reasoning	—Unverified	0
An Open-Source Software Toolkit & Benchmark Suite for the Evaluation and Adaptation of Multimodal Action Models	Jun 10, 2025	Action GenerationImage Captioning	—Unverified	0
Comment on The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity	Jun 10, 2025	Experimental Design	—Unverified	0
FlagEvalMM: A Flexible Framework for Comprehensive Multimodal Model Evaluation	Jun 10, 2025	Image-text RetrievalQuestion Answering	CodeCode Available	2
Exploring the Capabilities of the Frontier Large Language Models for Nuclear Energy Research	Jun 10, 2025	Code GenerationPrompt Engineering	—Unverified	0
DualEquiNet: A Dual-Space Hierarchical Equivariant Network for Large Biomolecules	Jun 10, 2025	Property Prediction	—Unverified	0
Scalable and Cost-Efficient de Novo Template-Based Molecular Generation	Jun 10, 2025	DiversityDrug Design	CodeCode Available	1
SDMPrune: Self-Distillation MLP Pruning for Efficient Large Language Models	Jun 10, 2025		CodeCode Available	1
Segment This Thing: Foveated Tokenization for Efficient Point-Prompted Segmentation	Jun 10, 2025	FoveationImage Segmentation	CodeCode Available	2
Solving the Job Shop Scheduling Problem with Graph Neural Networks: A Customizable Reinforcement Learning Environment	Jun 10, 2025	Combinatorial OptimizationImitation Learning	CodeCode Available	2
Monocular 3D Hand Pose Estimation with Implicit Camera Alignment	Jun 10, 2025	3D Hand Pose EstimationHand Pose Estimation	CodeCode Available	1
XGraphRAG: Interactive Visual Analysis for Graph-based Retrieval-Augmented Generation	Jun 10, 2025	graph constructionLanguage Modeling	CodeCode Available	0
SUTA-LM: Bridging Test-Time Adaptation and Language Model Rescoring for Robust ASR	Jun 10, 2025	Language ModelingLanguage Modelling	—Unverified	0
A Self-Refining Framework for Enhancing ASR Using TTS-Synthesized Data	Jun 10, 2025	text-to-speechText to Speech	—Unverified	0