The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 20551–20600 of 474278 papers

Title	Date	Tasks	Status	Hype
Catastrophic Cyber Capabilities Benchmark (3CB): Robustly Evaluating LLM Agent Cyber Offense Capabilities	Oct 10, 2024		CodeCode Available	1
QCircuitNet: A Large-Scale Hierarchical Dataset for Quantum Algorithm Design	Oct 10, 2024	Few-Shot Learning	CodeCode Available	1
StepTool: A Step-grained Reinforcement Learning Framework for Tool Learning in LLMs	Oct 10, 2024	Information RetrievalPolicy Gradient Methods	CodeCode Available	1
CoPESD: A Multi-Level Surgical Motion Dataset for Training Large Vision-Language Models to Co-Pilot Endoscopic Submucosal Dissection	Oct 10, 2024	Instruction Following	CodeCode Available	1
Minority-Focused Text-to-Image Generation via Prompt Optimization	Oct 10, 2024	Data AugmentationImage Generation	CodeCode Available	1
OneNet: A Fine-Tuning Free Framework for Few-Shot Entity Linking via Large Language Model Prompting	Oct 10, 2024	Entity LinkingFew-Shot Learning	CodeCode Available	1
Noether's razor: Learning Conserved Quantities	Oct 10, 2024	Model Selection	CodeCode Available	1
GenARM: Reward Guided Generation with Autoregressive Reward Model for Test-time Alignment	Oct 10, 2024	Text Generation	CodeCode Available	1
Efficient Dictionary Learning with Switch Sparse Autoencoders	Oct 10, 2024	Dictionary LearningMixture-of-Experts	CodeCode Available	1
Optimal-state Dynamics Estimation for Physics-based Human Motion Capture from Videos	Oct 10, 2024	Pose Estimation	CodeCode Available	1
ZeroComp: Zero-shot Object Compositing from Image Intrinsics via Diffusion	Oct 10, 2024		CodeCode Available	1
Neural Reasoning Networks: Efficient Interpretable Neural Networks With Automatic Textual Explanations	Oct 10, 2024	FairnessFeature Importance	CodeCode Available	1
Enhancing Zeroth-order Fine-tuning for Language Models with Low-rank Structures	Oct 10, 2024	parameter-efficient fine-tuning	CodeCode Available	1
StablePrompt: Automatic Prompt Tuning using Reinforcement Learning for Large Language Models	Oct 10, 2024	Question AnsweringReinforcement Learning (RL)	CodeCode Available	1
Causal Image Modeling for Efficient Visual Understanding	Oct 10, 2024	Causal Inference	CodeCode Available	1
Metalic: Meta-Learning In-Context with Protein Language Models	Oct 10, 2024	In-Context LearningMeta-Learning	CodeCode Available	1
RayEmb: Arbitrary Landmark Detection in X-Ray Images Using Ray Embedding Subspace	Oct 10, 2024	Medical Image Registration	CodeCode Available	1
CrackSegDiff: Diffusion Probability Model-based Multi-modal Crack Segmentation	Oct 10, 2024	Crack SegmentationDenoising	CodeCode Available	1
SPA: 3D Spatial-Awareness Enables Effective Embodied Representation	Oct 10, 2024	GPUNeural Rendering	CodeCode Available	1
Divide and Translate: Compositional First-Order Logic Translation and Verification for Complex Logical Reasoning	Oct 10, 2024	Language ModellingLarge Language Model	CodeCode Available	1
Executing Arithmetic: Fine-Tuning Large Language Models as Turing Machines	Oct 10, 2024		CodeCode Available	1
Understanding the Interplay between Parametric and Contextual Knowledge for Large Language Models	Oct 10, 2024		CodeCode Available	1
Multi-Agent Collaborative Data Selection for Efficient LLM Pretraining	Oct 10, 2024	Language ModelingLanguage Modelling	CodeCode Available	1
Automatic Curriculum Expert Iteration for Reliable LLM Reasoning	Oct 10, 2024	HallucinationLogical Reasoning	CodeCode Available	1
Physics and Deep Learning in Computational Wave Imaging	Oct 10, 2024	Deep Learning	CodeCode Available	1
Bilinear MLPs enable weight-based mechanistic interpretability	Oct 10, 2024	image-classificationImage Classification	CodeCode Available	1
Reward-Augmented Data Enhances Direct Preference Alignment of LLMs	Oct 10, 2024	Instruction Following	CodeCode Available	1
Pretraining Graph Transformers with Atom-in-a-Molecule Quantum Properties for Improved ADMET Modeling	Oct 10, 2024		CodeCode Available	1
TANet: Triplet Attention Network for All-In-One Adverse Weather Image Restoration	Oct 10, 2024	AllImage Restoration	CodeCode Available	1
Pap2Pat: Towards Automated Paper-to-Patent Drafting using Chunk-based Outline-guided Generation	Oct 9, 2024		CodeCode Available	1
DiffGAD: A Diffusion-based Unsupervised Graph Anomaly Detector	Oct 9, 2024	Anomaly DetectionGraph Anomaly Detection	CodeCode Available	1
Does Spatial Cognition Emerge in Frontier Models?	Oct 9, 2024		CodeCode Available	1
To Preserve or To Compress: An In-Depth Study of Connector Selection in Multimodal Large Language Models	Oct 9, 2024	MME	CodeCode Available	1
Dynamic Neural Potential Field: Online Trajectory Optimization in Presence of Moving Obstacles	Oct 9, 2024	Collision AvoidanceModel Predictive Control	CodeCode Available	1
Iterative Optimization Annotation Pipeline and ALSS-YOLO-Seg for Efficient Banana Plantation Segmentation in UAV Imagery	Oct 9, 2024	Segmentation	CodeCode Available	1
InstructG2I: Synthesizing Images from Multimodal Attributed Graphs	Oct 9, 2024	DenoisingRe-Ranking	CodeCode Available	1
TextLap: Customizing Language Models for Text-to-Layout Planning	Oct 9, 2024	Image GenerationNatural Language Understanding	CodeCode Available	1
Cluster-wise Graph Transformer with Dual-granularity Kernelized Attention	Oct 9, 2024	Graph LearningNode Clustering	CodeCode Available	1
Learning Evolving Tools for Large Language Models	Oct 9, 2024		CodeCode Available	1
Continual Learning in the Frequency Domain	Oct 9, 2024	Continual Learning	CodeCode Available	1
Personalized Visual Instruction Tuning	Oct 9, 2024	Image Generation	CodeCode Available	1
LLM Embeddings Improve Test-time Adaptation to Tabular Y\|X-Shifts	Oct 9, 2024	Test-time AdaptationWorld Knowledge	CodeCode Available	1
Towards Generalisable Time Series Understanding Across Domains	Oct 9, 2024	BenchmarkingTime Series	CodeCode Available	1
SEAL: Safety-enhanced Aligned LLM Fine-tuning via Bilevel Data Selection	Oct 9, 2024	Bilevel Optimization	CodeCode Available	1
BiC-MPPI: Goal-Pursuing, Sampling-Based Bidirectional Rollout Clustering Path Integral for Trajectory Optimization	Oct 9, 2024	Autonomous NavigationTrajectory Planning	CodeCode Available	1
Deep Correlated Prompting for Visual Recognition with Missing Modalities	Oct 9, 2024	Prompt Learning	CodeCode Available	1
Retrieval-Augmented Decision Transformer: External Memory for In-context RL	Oct 9, 2024	In-Context LearningReinforcement Learning (RL)	CodeCode Available	1
ING-VP: MLLMs cannot Play Easy Vision-based Games Yet	Oct 9, 2024	Spatial Reasoning	CodeCode Available	1
Tree of Problems: Improving structured problem solving with compositionality	Oct 9, 2024	In-Context Learning	CodeCode Available	1
Mitigating Time Discretization Challenges with WeatherODE: A Sandwich Physics-Driven Neural ODE for Weather Forecasting	Oct 9, 2024	Weather Forecasting	CodeCode Available	1