The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

659,983 papers248,104 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2751–2800 of 659983 papers

Title	Date	Tasks	Status	Hype
The Hidden Dimensions of LLM Alignment: A Multi-Dimensional Safety Analysis	Feb 13, 2025	Safety Alignment	CodeCode Available	3
MetaDE: Evolving Differential Evolution by Differential Evolution	Feb 13, 2025	Computational EfficiencyGPU	CodeCode Available	3
MDCrow: Automating Molecular Dynamics Workflows with Large Language Models	Feb 13, 2025		CodeCode Available	3
Cognify: Supercharging Gen-AI Workflows With Hierarchical Autotuning	Feb 12, 2025	RAGText to SQL	CodeCode Available	3
Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval-Augmented Generation	Feb 12, 2025	cross-modal alignmentmultimodal generation	CodeCode Available	3
FinRL-DeepSeek: LLM-Infused Risk-Sensitive Reinforcement Learning for Trading Agents	Feb 11, 2025		CodeCode Available	3
GENERator: A Long-Context Generative Genomic Foundation Model	Feb 11, 2025	model	CodeCode Available	3
Goedel-Prover: A Frontier Model for Open-Source Automated Theorem Proving	Feb 11, 2025	Automated Theorem ProvingLarge Language Model	CodeCode Available	3
EVEv2: Improved Baselines for Encoder-Free Vision-Language Models	Feb 10, 2025	Decoder	CodeCode Available	3
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling	Feb 10, 2025	Math	CodeCode Available	3
History-Guided Video Diffusion	Feb 10, 2025	Video Generation	CodeCode Available	3
PINGS: Gaussian Splatting Meets Distance Fields within a Point-Based Implicit Neural Map	Feb 9, 2025		CodeCode Available	3
Temporal Working Memory: Query-Guided Segment Refinement for Enhanced Multimodal Understanding	Feb 9, 2025	Image CaptioningImage-text Retrieval	CodeCode Available	3
ConRFT: A Reinforced Fine-tuning Method for VLA Models via Consistency Policy	Feb 8, 2025	Q-LearningSafe Exploration	CodeCode Available	3
FlashVideo:Flowing Fidelity to Detail for Efficient High-Resolution Video Generation	Feb 7, 2025	Computational EfficiencyText-to-Video Generation	CodeCode Available	3
Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuray	Feb 7, 2025	4kGeneral Knowledge	CodeCode Available	3
ITBench: Evaluating AI Agents across Diverse Real-World IT Automation Tasks	Feb 7, 2025	Benchmarking	CodeCode Available	3
VideoRoPE: What Makes for Good Video Rotary Position Embedding?	Feb 7, 2025	HallucinationPosition	CodeCode Available	3
MedRAG: Enhancing Retrieval-augmented Generation with Knowledge Graph-Elicited Reasoning for Healthcare Copilot	Feb 6, 2025	DiagnosticLarge Language Model	CodeCode Available	3
ConceptAttention: Diffusion Transformers Learn Highly Interpretable Features	Feb 6, 2025	Image SegmentationSegmentation	CodeCode Available	3
Ola: Pushing the Frontiers of Omni-Modal Language Model	Feb 6, 2025	cross-modal alignmentLanguage Modeling	CodeCode Available	3
Multi-agent Architecture Search via Agentic Supernet	Feb 6, 2025	Language ModelingLanguage Modelling	CodeCode Available	3
Demystifying Long Chain-of-Thought Reasoning in LLMs	Feb 5, 2025	Reinforcement Learning (RL)	CodeCode Available	3
One Diffusion Step to Real-World Super-Resolution via Flow Trajectory Distillation	Feb 4, 2025	Image Super-ResolutionSuper-Resolution	CodeCode Available	3
ParetoQ: Scaling Laws in Extremely Low-bit LLM Quantization	Feb 4, 2025	Quantization	CodeCode Available	3
Transolver++: An Accurate Neural Solver for PDEs on Million-Scale Geometries	Feb 4, 2025	GPU	CodeCode Available	3
Flow Q-Learning	Feb 4, 2025	Action GenerationD4RL	CodeCode Available	3
mWhisper-Flamingo for Multilingual Audio-Visual Noise-Robust Speech Recognition	Feb 3, 2025	Audio-Visual Speech RecognitionDecoder	CodeCode Available	3
GFM-RAG: Graph Foundation Model for Retrieval Augmented Generation	Feb 3, 2025	Graph Neural NetworkKnowledge Graphs	CodeCode Available	3
Safety at Scale: A Comprehensive Survey of Large Model Safety	Feb 2, 2025	Autonomous DrivingData Poisoning	CodeCode Available	3
Gradient Alignment in Physics-informed Neural Networks: A Second-Order Optimization Perspective	Feb 2, 2025	Multi-Task Learning	CodeCode Available	3
OneForecast: A Universal Framework for Global and Regional Weather Forecasting	Feb 1, 2025	Weather Forecasting	CodeCode Available	3
MambaGlue: Fast and Robust Local Feature Matching With Mamba	Feb 1, 2025	Mamba	CodeCode Available	3
M+: Extending MemoryLLM with Scalable Long-Term Memory	Feb 1, 2025	16kGPU	CodeCode Available	3
Rethinking Early Stopping: Refine, Then Calibrate	Jan 31, 2025	Decision Making	CodeCode Available	3
Test-Time Training Scaling Laws for Chemical Exploration in Drug Design	Jan 31, 2025	Drug DesignDrug Discovery	CodeCode Available	3
Partially Rewriting a Transformer in Natural Language	Jan 31, 2025	Language ModelingLanguage Modelling	CodeCode Available	3
Decoding-based Regression	Jan 31, 2025	Density Estimationregression	CodeCode Available	3
Advances in Multimodal Adaptation and Generalization: From Traditional Approaches to Foundation Models	Jan 30, 2025	Action RecognitionDomain Adaptation	CodeCode Available	3
LLMs can see and hear without any training	Jan 30, 2025	Audio captioningImage Generation	CodeCode Available	3
Sparser, Better, Faster, Stronger: Sparsity Detection for Efficient Automatic Differentiation	Jan 29, 2025		CodeCode Available	3
Molecular Fingerprints Are Strong Models for Peptide Function Prediction	Jan 29, 2025	Graph ClassificationGraph Regression	CodeCode Available	3
Amplifier: Bringing Attention to Neglected Low-Energy Components in Time Series Forecasting	Jan 28, 2025	SpecificityTime Series	CodeCode Available	3
DiffSplat: Repurposing Image Diffusion Models for Scalable Gaussian Splat Generation	Jan 28, 2025	3D Generation	CodeCode Available	3
Deformable Beta Splatting	Jan 27, 2025	3DGSNovel View Synthesis	CodeCode Available	3
Parametric Retrieval Augmented Generation	Jan 27, 2025	Domain AdaptationRAG	CodeCode Available	3
MedAgentBench: A Realistic Virtual EHR Environment to Benchmark Medical LLM Agents	Jan 24, 2025	Benchmarking	CodeCode Available	3
HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation	Jan 24, 2025	Autonomous DrivingLanguage Modeling	CodeCode Available	3
OSUM: Advancing Open Speech Understanding Models with Limited Resources in Academia	Jan 23, 2025	Emotion RecognitionEvent Detection	CodeCode Available	3
The Breeze 2 Herd of Models: Traditional Chinese LLMs Based on Llama with Vision-Aware and Function-Calling Capabilities	Jan 23, 2025	General KnowledgeInstruction Following	CodeCode Available	3