The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 8151–8200 of 661570 papers

Title	Date	Tasks	Status	Hype
Derail Yourself: Multi-turn LLM Jailbreak Attack through Self-discovered Clues	Oct 14, 2024	LLM JailbreakSafety Alignment	CodeCode Available	2
MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding	Oct 15, 2024	Visual Question Answering	CodeCode Available	2
Evaluating Morphological Compositional Generalization in Large Language Models	Oct 16, 2024	Text Generation	CodeCode Available	2
IntersectionZoo: Eco-driving for Benchmarking Multi-Agent Contextual Reinforcement Learning	Oct 19, 2024	BenchmarkingMulti-agent Reinforcement Learning	CodeCode Available	2
DM-Codec: Distilling Multimodal Representations for Speech Tokenization	Oct 19, 2024	Self-Supervised LearningSpeech Tokenization	CodeCode Available	2
GPT or BERT: why not both?	Oct 31, 2024	Causal Language ModelingLanguage Modeling	CodeCode Available	2
Model merging with SVD to tie the Knots	Oct 25, 2024	model	CodeCode Available	2
SciPIP: An LLM-based Scientific Paper Idea Proposer	Oct 30, 2024	Retrieval	CodeCode Available	2
Ada-MSHyper: Adaptive Multi-Scale Hypergraph Transformer for Time Series Forecasting	Oct 31, 2024	Time SeriesTime Series Forecasting	CodeCode Available	2
DPU: Dynamic Prototype Updating for Multimodal Out-of-Distribution Detection	Nov 12, 2024	Optical Flow EstimationOut-of-Distribution Detection	CodeCode Available	2
MetaOpenFOAM: an LLM-based multi-agent framework for CFD	Jul 31, 2024	RAGRetrieval-augmented Generation	CodeCode Available	2
PyGen: A Collaborative Human-AI Approach to Python Package Creation	Nov 13, 2024	Code Generation	CodeCode Available	2
Disentangling Memory and Reasoning Ability in Large Language Models	Nov 20, 2024	Decision MakingRetrieval	CodeCode Available	2
MMGenBench: Evaluating the Limits of LMMs from the Text-to-Image Generation Perspective	Nov 21, 2024	Image ComprehensionImage Generation	CodeCode Available	2
vesselFM: A Foundation Model for Universal 3D Blood Vessel Segmentation	Nov 26, 2024	Image SegmentationMedical Image Analysis	CodeCode Available	2
TryOffDiff: Virtual-Try-Off via High-Fidelity Garment Reconstruction using Diffusion Models	Nov 27, 2024	Garment ReconstructionImage Generation	CodeCode Available	2
TexGaussian: Generating High-quality PBR Material via Octree-based 3D Gaussian Splatting	Nov 29, 2024	DenoisingImage Generation	CodeCode Available	2
Lost & Found: Tracking Changes from Egocentric Observations in 3D Dynamic Scene Graphs	Nov 28, 2024	Object	CodeCode Available	2
X-Prompt: Towards Universal In-Context Image Generation in Auto-Regressive Vision Language Foundation Models	Dec 2, 2024	Image GenerationIn-Context Learning	CodeCode Available	2
CoRNStack: High-Quality Contrastive Data for Better Code Retrieval and Reranking	Dec 1, 2024	Bug fixingCode Generation	CodeCode Available	2
FLAIR: VLM with Fine-grained Language-informed Image Representations	Dec 4, 2024	Language ModelingLanguage Modelling	CodeCode Available	2
ComplexFuncBench: Exploring Multi-Step and Constrained Function Calling under Long-Context Scenario	Jan 17, 2025		CodeCode Available	2
SoRA: Singular Value Decomposed Low-Rank Adaptation for Domain Generalizable Representation Learning	Dec 5, 2024	Domain AdaptationDomain Generalization	CodeCode Available	2
Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation	Dec 5, 2024	Image ComprehensionRepresentation Learning	CodeCode Available	2
JPC: Flexible Inference for Predictive Coding Networks in JAX	Dec 4, 2024		CodeCode Available	2
MESA: Effective Matching Redundancy Reduction by Semantic Area Segmentation	Aug 1, 2024	Patch Matching	CodeCode Available	2
DriveMM: All-in-One Large Multimodal Model for Autonomous Driving	Dec 10, 2024	AllAutonomous Driving	CodeCode Available	2
MAC-Ego3D: Multi-Agent Gaussian Consensus for Real-Time Collaborative Ego-Motion and Photorealistic 3D Reconstruction	Dec 12, 2024	3D ReconstructionMotion Estimation	CodeCode Available	2
MMLU-CF: A Contamination-free Multi-task Language Understanding Benchmark	Dec 19, 2024	MMLUMultiple-choice	CodeCode Available	2
MR-GDINO: Efficient Open-World Continual Object Detection	Dec 20, 2024	Continual Learningobject-detection	CodeCode Available	2
Scenario-Wise Rec: A Multi-Scenario Recommendation Benchmark	Dec 23, 2024		CodeCode Available	2
EvalMuse-40K: A Reliable and Fine-Grained Benchmark with Comprehensive Human Annotations for Text-to-Image Generation Model Evaluation	Dec 24, 2024	Image CaptioningImage Generation	CodeCode Available	2
Test-time Computing: from System-1 Thinking to System-2 Thinking	Jan 5, 2025		CodeCode Available	2
TakuNet: an Energy-Efficient CNN for Real-Time Inference on Embedded UAV systems in Emergency Response Scenarios	Jan 10, 2025	Aerial Scene ClassificationCPU	CodeCode Available	2
Russian Financial Statements Database: A firm-level collection of the universe of financial statements	Jan 10, 2025	Imputation	CodeCode Available	2
ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation	Jan 11, 2025	Chart UnderstandingCode Generation	CodeCode Available	2
Prompt-Free Diffusion: Taking "Text" out of Text-to-Image Diffusion Models	May 25, 2023	Conditional Text-to-Image SynthesisImage Generation	CodeCode Available	2
ScoreFlow: Mastering LLM Agent Workflows via Score-based Preference Optimization	Feb 6, 2025	Language ModelingLanguage Modelling	CodeCode Available	2
SalM2: An Extremely Lightweight Saliency Mamba Model for Real-Time Cognitive Awareness of Driver Attention	Feb 22, 2025	Mamba	CodeCode Available	2
TritonBench: Benchmarking Large Language Model Capabilities for Generating Triton Operators	Feb 20, 2025	BenchmarkingCode Generation	CodeCode Available	2
A Survey of Safety on Large Vision-Language Models: Attacks, Defenses and Evaluations	Feb 14, 2025	Survey	CodeCode Available	2
Sanity Checking Causal Representation Learning on a Simple Real-World System	Feb 27, 2025	Representation Learning	CodeCode Available	2
Enhanced Contrastive Learning with Multi-view Longitudinal Data for Chest X-ray Report Generation	Feb 27, 2025	Contrastive LearningDiagnostic	CodeCode Available	2
A Training-free LLM-based Approach to General Chinese Character Error Correction	Feb 21, 2025	Language ModelingLanguage Modelling	CodeCode Available	2
SemiSAM+: Rethinking Semi-Supervised Medical Image Segmentation in the Era of Foundation Models	Feb 28, 2025	Image SegmentationMedical Image Segmentation	CodeCode Available	2
Neural Posterior Estimation for Cataloging Astronomical Images with Spatially Varying Backgrounds and Point Spread Functions	Feb 28, 2025	Variational Inference	CodeCode Available	2
AnalogGenie: A Generative Engine for Automatic Discovery of Analog Circuit Topologies	Feb 28, 2025		CodeCode Available	2
Patch-wise Structural Loss for Time Series Forecasting	Mar 2, 2025	Time SeriesTime Series Forecasting	CodeCode Available	2
Find First, Track Next: Decoupling Identification and Propagation in Referring Video Object Segmentation	Mar 5, 2025	ObjectReferring Video Object Segmentation	CodeCode Available	2
MM-OR: A Large Multimodal Operating Room Dataset for Semantic Understanding of High-Intensity Surgical Environments	Mar 4, 2025	2D Panoptic SegmentationGraph Generation	CodeCode Available	2