The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 6151–6175 of 474278 papers

Title	Date	Tasks	Status	Hype
SARChat-Bench-2M: A Multi-Task Vision-Language Benchmark for SAR Image Interpretation	Feb 12, 2025	Earth Observationobject-detection	CodeCode Available	2
Human-Centric Foundation Models: Perception, Generation and Agentic Modeling	Feb 12, 2025	Survey	CodeCode Available	2
A Systematic Review on the Evaluation of Large Language Models in Theory of Mind Tasks	Feb 12, 2025		CodeCode Available	2
ViLa-MIL: Dual-scale Vision-Language Multiple Instance Learning for Whole Slide Image Classification	Feb 12, 2025	DecoderDescriptive	CodeCode Available	2
The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks	Feb 12, 2025		CodeCode Available	2
WorldGUI: An Interactive Benchmark for Desktop GUI Automation from Any Starting Point	Feb 12, 2025		CodeCode Available	2
Brain Latent Progression: Individual-based Spatiotemporal Disease Progression on 3D Brain MRIs via Latent Diffusion	Feb 12, 2025		CodeCode Available	2
Fino1: On the Transferability of Reasoning Enhanced LLMs to Finance	Feb 12, 2025	BenchmarkingLong-Context Understanding	CodeCode Available	2
LIR-LIVO: A Lightweight,Robust LiDAR/Vision/Inertial Odometry with Illumination-Resilient Deep Features	Feb 12, 2025	Pose EstimationVisual Odometry	CodeCode Available	2
Cluster and Predict Latents Patches for Improved Masked Image Modeling	Feb 12, 2025	Representation Learning	CodeCode Available	2
mmE5: Improving Multimodal Multilingual Embeddings via High-quality Synthetic Data	Feb 12, 2025	cross-modal alignmentLarge Language Model	CodeCode Available	2
TLOB: A Novel Transformer Model with Dual Attention for Price Trend Prediction with Limit Order Book Data	Feb 12, 2025		CodeCode Available	2
MeshSplats: Mesh-Based Rendering with Gaussian Splatting Initialization	Feb 11, 2025		CodeCode Available	2
DPO-Shift: Shifting the Distribution of Direct Preference Optimization	Feb 11, 2025		CodeCode Available	2
LASP-2: Rethinking Sequence Parallelism for Linear Attention and Its Hybrid	Feb 11, 2025		CodeCode Available	2
Automated Capability Discovery via Model Self-Exploration	Feb 11, 2025	model	CodeCode Available	2
TextAtlas5M: A Large-scale Dataset for Dense Text Image Generation	Feb 11, 2025	Image Generation	CodeCode Available	2
Less is More: Masking Elements in Image Condition Features Avoids Content Leakages in Style Transfer Diffusion Models	Feb 11, 2025	Style Transfer	CodeCode Available	2
Semi-Supervised Vision-Centric 3D Occupancy World Model for Autonomous Driving	Feb 11, 2025	AttributeAutonomous Driving	CodeCode Available	2
Training Deep Learning Models with Norm-Constrained LMOs	Feb 11, 2025	Deep Learning	CodeCode Available	2
RoboBERT: An End-to-end Multimodal Robotic Manipulation Model	Feb 11, 2025	Data Augmentation	CodeCode Available	2
On the Emergence of Thinking in LLMs I: Searching for the Right Intuition	Feb 10, 2025	Math	CodeCode Available	2
MaterialFusion: High-Quality, Zero-Shot, and Controllable Material Transfer with Diffusion Models	Feb 10, 2025		CodeCode Available	2
SAMRefiner: Taming Segment Anything Model for Universal Mask Refinement	Feb 10, 2025	Semantic Segmentation	CodeCode Available	2
KARMA: Leveraging Multi-Agent LLMs for Automated Knowledge Graph Enrichment	Feb 10, 2025	ArticlesKnowledge Graphs	CodeCode Available	2