The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 17651–17700 of 474278 papers

Title	Date	Tasks	Status	Hype
Is Safety Standard Same for Everyone? User-Specific Safety Evaluation of Large Language Models	Feb 20, 2025	Language ModelingLanguage Modelling	CodeCode Available	1
Multi-Objective Causal Bayesian Optimization	Feb 20, 2025	Bayesian OptimizationDecision Making	CodeCode Available	1
SegAnyPET: Universal Promptable Segmentation from Positron Emission Tomography Images	Feb 20, 2025	Image SegmentationSegmentation	CodeCode Available	1
Exploiting Deblurring Networks for Radiance Fields	Feb 20, 2025	Computational EfficiencyDeblurring	CodeCode Available	1
Aligning LLMs to Ask Good Questions A Case Study in Clinical Reasoning	Feb 20, 2025	AttributeDiagnostic	CodeCode Available	1
CORBA: Contagious Recursive Blocking Attacks on Multi-Agent Systems Based on Large Language Models	Feb 20, 2025	BlockingLanguage Modeling	CodeCode Available	1
ChatVLA: Unified Multimodal Understanding and Robot Control with Vision-Language-Action Model	Feb 20, 2025	Mixture-of-ExpertsQuestion Answering	CodeCode Available	1
CLIPPER: Compression enables long-context synthetic data generation	Feb 20, 2025	Claim VerificationSynthetic Data Generation	CodeCode Available	1
Unstructured Evidence Attribution for Long Context Query Focused Summarization	Feb 20, 2025	Query-focused Summarization	CodeCode Available	1
SEA-HELM: Southeast Asian Holistic Evaluation of Language Models	Feb 20, 2025		CodeCode Available	1
Pre-training Graph Neural Networks on Molecules by Using Subgraph-Conditioned Graph Information Bottleneck	Feb 20, 2025	Graph ClassificationGraph Neural Network	CodeCode Available	1
Measuring Faithfulness of Chains of Thought by Unlearning Reasoning Steps	Feb 20, 2025	Question Answering	CodeCode Available	1
Bridging Text and Vision: A Multi-View Text-Vision Registration Approach for Cross-Modal Place Recognition	Feb 20, 2025	Cross-modal place recognitionNatural Language Understanding	CodeCode Available	1
CDGS: Confidence-Aware Depth Regularization for 3D Gaussian Splatting	Feb 20, 2025	3DGS3D Reconstruction	CodeCode Available	1
StructFlowBench: A Structured Flow Benchmark for Multi-turn Instruction Following	Feb 20, 2025	Instruction Following	CodeCode Available	1
MedFuncta: Modality-Agnostic Representations Based on Efficient Neural Fields	Feb 20, 2025	Medical Image AnalysisMeta-Learning	CodeCode Available	1
I-MCTS: Enhancing Agentic AutoML via Introspective Monte Carlo Tree Search	Feb 20, 2025	AutoMLCode Generation	CodeCode Available	1
NAVIG: Natural Language-guided Analysis with Vision Language Models for Image Geo-localization	Feb 20, 2025	geo-localization	CodeCode Available	1
Middle-Layer Representation Alignment for Cross-Lingual Transfer in Fine-Tuned LLMs	Feb 20, 2025	Cross-Lingual TransferMachine Translation	CodeCode Available	1
Noisy Test-Time Adaptation in Vision-Language Models	Feb 20, 2025	Test-time Adaptation	CodeCode Available	1
LongWriter-V: Enabling Ultra-Long and High-Fidelity Generation in Vision-Language Models	Feb 20, 2025		CodeCode Available	1
How to Get Your LLM to Generate Challenging Problems for Evaluation	Feb 20, 2025	Code CompletionMath	CodeCode Available	1
Pursuing Top Growth with Novel Loss Function	Feb 20, 2025		CodeCode Available	1
H3DE-Net: Efficient and Accurate 3D Landmark Detection in Medical Imaging	Feb 20, 2025	Computational EfficiencyMedical Image Analysis	CodeCode Available	1
Tree-of-Debate: Multi-Persona Debate Trees Elicit Critical Thinking for Scientific Comparative Analysis	Feb 20, 2025	Articles	CodeCode Available	1
FlowAgent: Achieving Compliance and Flexibility for Workflow Agents	Feb 20, 2025		CodeCode Available	1
Improving LLM-powered Recommendations with Personalized Information	Feb 19, 2025	Recommendation Systems	CodeCode Available	1
Enhancing Cognition and Explainability of Multimodal Foundation Models with Self-Synthesized Data	Feb 19, 2025	Fine-Grained Visual RecognitionPneumonia Detection	CodeCode Available	1
PeerQA: A Scientific Question Answering Dataset from Peer Reviews	Feb 19, 2025	answerability predictionAnswer Generation	CodeCode Available	1
Proving Olympiad Inequalities by Synergizing LLMs and Symbolic Reasoning	Feb 19, 2025	Mathematical Reasoning	CodeCode Available	1
RobustX: Robust Counterfactual Explanations Made Easy	Feb 19, 2025	counterfactualDecision Making	CodeCode Available	1
MM-Verify: Enhancing Multimodal Reasoning with Chain-of-Thought Verification	Feb 19, 2025	Multimodal Reasoning	CodeCode Available	1
Judging the Judges: A Collection of LLM-Generated Relevance Judgements	Feb 19, 2025	Information Retrieval	CodeCode Available	1
Reasoning with Reinforced Functional Token Tuning	Feb 19, 2025	Math	CodeCode Available	1
Deep Learning for VWAP Execution in Crypto Markets: Beyond the Volume Curve	Feb 19, 2025		CodeCode Available	1
Triad: Vision Foundation Model for 3D Magnetic Resonance Imaging	Feb 19, 2025	Cancer ClassificationComputed Tomography (CT)	CodeCode Available	1
Spiking Point Transformer for Point Cloud Classification	Feb 19, 2025	ClassificationPoint Cloud Classification	CodeCode Available	1
LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization	Feb 19, 2025		CodeCode Available	1
SPEX: Scaling Feature Interaction Explanations for LLMs	Feb 19, 2025		CodeCode Available	1
2.5D U-Net with Depth Reduction for 3D CryoET Object Identification	Feb 19, 2025	Electron TomographyKeypoint Detection	CodeCode Available	1
Which Attention Heads Matter for In-Context Learning?	Feb 19, 2025	In-Context Learning	CodeCode Available	1
Latent Distribution Decoupling: A Probabilistic Framework for Uncertainty-Aware Multimodal Emotion Recognition	Feb 19, 2025	Emotion RecognitionMultimodal Emotion Recognition	CodeCode Available	1
Lost in Sequence: Do Large Language Models Understand Sequential Recommendation?	Feb 19, 2025	Sequential Recommendation	CodeCode Available	1
Benchmarking LLMs for Political Science: A United Nations Perspective	Feb 19, 2025	BenchmarkingDecision Making	CodeCode Available	1
Refining embeddings with fill-tuning: data-efficient generalised performance improvements for materials foundation models	Feb 19, 2025		CodeCode Available	1
AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidence	Feb 19, 2025	Code GenerationDecision Making	CodeCode Available	1
From Tools to Teammates: Evaluating LLMs in Multi-Session Coding Interactions	Feb 19, 2025		CodeCode Available	1
Collaborative Retrieval for Large Language Model-based Conversational Recommender Systems	Feb 19, 2025	Collaborative FilteringConversational Recommendation	CodeCode Available	1
Learning-Guided Rolling Horizon Optimization for Long-Horizon Flexible Job-Shop Scheduling	Feb 18, 2025	Combinatorial OptimizationJob Shop Scheduling	CodeCode Available	1
A Cognitive Writing Perspective for Constrained Long-Form Text Generation	Feb 18, 2025	FormText Generation	CodeCode Available	1