The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 17101–17150 of 474278 papers

Title	Date	Tasks	Status	Hype
Worst-Case Symbolic Constraints Analysis and Generalisation with Large Language Models	Jun 9, 2025	Code Generation	—Unverified	0
FairDICE: Fairness-Driven Offline Multi-Objective Reinforcement Learning	Jun 9, 2025	FairnessMulti-Objective Reinforcement Learning	—Unverified	0
Domain Switching on the Pareto Front: Multi-Objective Deep Kernel Learning in Automated Piezoresponse Force Microscopy	Jun 9, 2025	Active LearningCombinatorial Optimization	—Unverified	0
Unable to Forget: Proactive lnterference Reveals Working Memory Limits in LLMs Beyond Context Length	Jun 9, 2025	Information RetrievalPrompt Engineering	—Unverified	0
A Comprehensive Study of Decoder-Only LLMs for Text-to-Image Generation	Jun 9, 2025	DecoderImage Generation	—Unverified	0
Scaling Laws of Motion Forecasting and Planning -- A Technical Report	Jun 9, 2025	Autonomous DrivingLanguage Modeling	—Unverified	0
Seeing Voices: Generating A-Roll Video from Audio with Mirage	Jun 9, 2025	Speech Synthesistext-to-speech	—Unverified	0
FedGA-Tree: Federated Decision Tree using Genetic Algorithm	Jun 9, 2025	Federated Learning	—Unverified	0
Guideline Forest: Experience-Induced Multi-Guideline Reasoning with Stepwise Aggregation	Jun 9, 2025	GSM8KHumanEval	—Unverified	0
ArchiLense: A Framework for Quantitative Analysis of Architectural Styles Based on Vision Large Language Models	Jun 9, 2025	Descriptive	—Unverified	0
Conservative Bias in Large Language Models: Measuring Relation Predictions	Jun 9, 2025	HallucinationRelation	—Unverified	0
QA-LIGN: Aligning LLMs through Constitutionally Decomposed QA	Jun 9, 2025	Large Language Model	—Unverified	0
EconWebArena: Benchmarking Autonomous Agents on Economic Tasks in Realistic Web Environments	Jun 9, 2025	BenchmarkingNavigate	—Unverified	0
LLM-BT-Terms: Back-Translation as a Framework for Terminology Standardization and Dynamic Semantic Embedding	Jun 9, 2025	Translation	—Unverified	0
Bingo: Boosting Efficient Reasoning of LLMs via Dynamic and Significance-based Reinforcement Learning	Jun 9, 2025	Reinforcement Learning (RL)	—Unverified	0
Accelerating Spectral Clustering under Fairness Constraints	Jun 9, 2025	ClusteringComputational Efficiency	—Unverified	0
A Machine Learning Approach to Generate Residual Stress Distributions using Sparse Characterization Data in Friction-Stir Processed Parts	Jun 9, 2025	Friction	—Unverified	0
The Impact of Feature Scaling In Machine Learning: Effects on Regression and Classification Tasks	Jun 9, 2025	regression	—Unverified	0
Mondrian: Transformer Operators via Domain Decomposition	Jun 9, 2025	Operator learning	—Unverified	0
Interpreting Agent Behaviors in Reinforcement-Learning-Based Cyber-Battle Simulation Platforms	Jun 9, 2025	Deep Reinforcement Learning	—Unverified	0
Learning-Based Multiuser Scheduling in MIMO-OFDM Systems with Hybrid Beamforming	Jun 9, 2025	FairnessScheduling	—Unverified	0
Generative Learning of Differentiable Object Models for Compositional Interpretation of Complex Scenes	Jun 9, 2025	Image Reconstruction	—Unverified	0
Using Satellite Images And Self-supervised Machine Learning Networks To Detect Water Hidden Under Vegetation	Jun 9, 2025	Deep Clustering	—Unverified	0
Open World Scene Graph Generation using Vision Language Models	Jun 9, 2025	Graph GenerationScene Graph Generation	CodeCode Available	2
Compound AI Systems Optimization: A Survey of Methods, Challenges, and Future Directions	Jun 9, 2025	Reinforcement Learning (RL)	CodeCode Available	1
CheMatAgent: Enhancing LLMs for Chemistry and Materials Science through Tree-Search Based Tool Learning	Jun 9, 2025	Information Retrieval	CodeCode Available	1
Highly Compressed Tokenizer Can Generate Without Training	Jun 9, 2025	Image GenerationQuantization	CodeCode Available	3
MoE-MLoRA for Multi-Domain CTR Prediction: Efficient Adaptation with Expert Specialization	Jun 9, 2025	Click-Through Rate PredictionDiversity	CodeCode Available	0
ETT-CKGE: Efficient Task-driven Tokens for Continual Knowledge Graph Embedding	Jun 9, 2025	Graph EmbeddingKnowledge Graph Embedding	CodeCode Available	0
Multilingual Hate Speech Detection in Social Media Using Translation-Based Approaches with Large Language Models	Jun 9, 2025	Hate Speech Detection	—Unverified	0
Surgeon Style Fingerprinting and Privacy Risk Quantification via Discrete Diffusion Models in a Vision-Language-Action Framework	Jun 9, 2025	DenoisingVision-Language-Action	CodeCode Available	0
Info-Coevolution: An Efficient Framework for Data Model Coevolution	Jun 9, 2025	Active Learning	CodeCode Available	0
Automatic Generation of Inference Making Questions for Reading Comprehension Assessments	Jun 9, 2025	DiagnosticReading Comprehension	CodeCode Available	0
Thinking vs. Doing: Agents that Reason by Scaling Test-Time Interaction	Jun 9, 2025	Reinforcement Learning (RL)	CodeCode Available	2
Dynamic Diffusion Schrödinger Bridge in Astrophysical Observational Inversions	Jun 9, 2025	Physical SimulationsPrediction	CodeCode Available	0
CuRe: Cultural Gaps in the Long Tail of Text-to-Image Systems	Jun 9, 2025	AttributeBenchmarking	CodeCode Available	0
From Passive to Active Reasoning: Can Large Language Models Ask the Right Questions under Incomplete Information?	Jun 9, 2025		CodeCode Available	1
Instruction-Tuned Video-Audio Models Elucidate Functional Specialization in the Brain	Jun 9, 2025	Disentanglement	CodeCode Available	0
Surgeons Awareness, Expectations, and Involvement with Artificial Intelligence: a Survey Pre and Post the GPT Era	Jun 9, 2025	Ethics	—Unverified	0
MoE-GPS: Guidlines for Prediction Strategy for Dynamic Expert Duplication in MoE Load Balancing	Jun 9, 2025	GPUMixture-of-Experts	—Unverified	0
Sparse Interpretable Deep Learning with LIES Networks for Symbolic Regression	Jun 9, 2025	Deep Learningregression	CodeCode Available	0
Deep reinforcement learning for near-deterministic preparation of cubic- and quartic-phase gates in photonic quantum computing	Jun 9, 2025	Deep Reinforcement Learning	—Unverified	0
Cognitive Weave: Synthesizing Abstracted Knowledge with a Spatio-Temporal Resonance Graph	Jun 9, 2025	Large Language ModelQuestion Answering	CodeCode Available	0
SOP-Bench: Complex Industrial SOPs for Evaluating LLM Agents	Jun 9, 2025	BenchmarkingSynthetic Data Generation	—Unverified	0
GTR-CoT: Graph Traversal as Visual Chain of Thought for Molecular Structure Recognition	Jun 9, 2025	Image Captioning	CodeCode Available	0
Repeton: Structured Bug Repair with ReAct-Guided Patch-and-Test Cycles	Jun 9, 2025	Code GenerationRAG	—Unverified	0
Benchmarking Pre-Trained Time Series Models for Electricity Price Forecasting	Jun 9, 2025	BenchmarkingDecision Making	—Unverified	0
IGraSS: Learning to Identify Infrastructure Networks from Satellite Imagery by Iterative Graph-constrained Semantic Segmentation	Jun 9, 2025	SegmentationSemantic Segmentation	—Unverified	0
Ego-centric Learning of Communicative World Models for Autonomous Driving	Jun 9, 2025	Autonomous DrivingMulti-agent Reinforcement Learning	—Unverified	0
SHIELD: Secure Hypernetworks for Incremental Expansion Learning Defense	Jun 9, 2025	Continual Learning	—Unverified	0