The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 6801–6825 of 474278 papers

Title	Date	Tasks	Status	Hype
VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection	Nov 22, 2024	Question AnsweringVideo Question Answering	CodeCode Available	2
EfficientViM: Efficient Vision Mamba with Hidden State Mixer based State Space Duality	Nov 22, 2024	Efficient Neural NetworkImage Classification	CodeCode Available	2
Open-Vocabulary Online Semantic Mapping for SLAM	Nov 22, 2024	SegmentationSemantic SLAM	CodeCode Available	2
AnyText2: Visual Text Generation and Editing With Customizable Attributes	Nov 22, 2024	Image GenerationText Generation	CodeCode Available	2
Derivative-Free Diffusion Manifold-Constrained Gradient for Unified XAI	Nov 22, 2024	counterfactualCounterfactual Explanation	CodeCode Available	2
Zero-Shot Coreset Selection: Efficient Pruning for Unlabeled Data	Nov 22, 2024		CodeCode Available	2
RE-Bench: Evaluating frontier AI R&D capabilities of language model agents against human experts	Nov 22, 2024	AI AgentLanguage Modeling	CodeCode Available	2
MovieBench: A Hierarchical Movie Level Dataset for Long Video Generation	Nov 22, 2024	Video Generation	CodeCode Available	2
DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Models	Nov 22, 2024		CodeCode Available	2
ScribeAgent: Towards Specialized Web Agents Using Production-Scale Workflow Data	Nov 22, 2024	Language ModelingLanguage Modelling	CodeCode Available	2
GMAI-VL & GMAI-VL-5.5M: A Large Vision-Language Model and A Comprehensive Multimodal Dataset Towards General Medical AI	Nov 21, 2024	Decision MakingLanguage Modeling	CodeCode Available	2
Natural Language Reinforcement Learning	Nov 21, 2024	Decision Makingreinforcement-learning	CodeCode Available	2
MMGenBench: Evaluating the Limits of LMMs from the Text-to-Image Generation Perspective	Nov 21, 2024	Image ComprehensionImage Generation	CodeCode Available	2
EasyHOI: Unleashing the Power of Large Models for Reconstructing Hand-Object Interactions in the Wild	Nov 21, 2024	3D ReconstructionObject	CodeCode Available	2
CodeSAM: Source Code Representation Learning by Infusing Self-Attention with Multi-Code-View Graphs	Nov 21, 2024	Clone DetectionCode Search	CodeCode Available	2
BiomedCoOp: Learning to Prompt for Biomedical Vision-Language Models	Nov 21, 2024	image-classificationImage Classification	CodeCode Available	2
FunctionChat-Bench: Comprehensive Evaluation of Language Models' Generative Capabilities in Korean Tool-use Dialogs	Nov 21, 2024	Relevance Detection	CodeCode Available	2
Empower Structure-Based Molecule Optimization with Gradient Guided Bayesian Flow Networks	Nov 20, 2024	Bayesian InferenceDrug Design	CodeCode Available	2
Quantized symbolic time series approximation	Nov 20, 2024	Anomaly DetectionAstronomy	CodeCode Available	2
Disentangling Memory and Reasoning Ability in Large Language Models	Nov 20, 2024	Decision MakingRetrieval	CodeCode Available	2
DriveMLLM: A Benchmark for Spatial Understanding with Multimodal Large Language Models in Autonomous Driving	Nov 20, 2024	Autonomous Drivingmotion prediction	CodeCode Available	2
RAW-Diffusion: RGB-Guided Diffusion Models for High-Fidelity RAW Image Generation	Nov 20, 2024	Image Generationobject-detection	CodeCode Available	2
Find Any Part in 3D	Nov 20, 2024	3D Part SegmentationDiversity	CodeCode Available	2
SimPhony: A Device-Circuit-Architecture Cross-Layer Modeling and Simulation Framework for Heterogeneous Electronic-Photonic AI System	Nov 20, 2024		CodeCode Available	2
Practical Compact Deep Compressed Sensing	Nov 20, 2024	compressed sensing	CodeCode Available	2