The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 7251–7300 of 661570 papers

Title	Date	Tasks	Status	Hype
Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation	Oct 7, 2024	Prompt EngineeringVideo Generation	CodeCode Available	2
ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery	Oct 7, 2024	scientific discovery	CodeCode Available	2
Ensured: Explanations for Decreasing the Epistemic Uncertainty in Predictions	Oct 7, 2024		CodeCode Available	2
SecAlign: Defending Against Prompt Injection with Preference Optimization	Oct 7, 2024		CodeCode Available	2
A Simple Image Segmentation Framework via In-Context Examples	Oct 7, 2024	DecoderImage Segmentation	CodeCode Available	2
TidalDecode: Fast and Accurate LLM Decoding with Position Persistent Sparse Attention	Oct 7, 2024	Position	CodeCode Available	2
Learning Efficient and Effective Trajectories for Differential Equation-based Image Restoration	Oct 7, 2024	Image RestorationNavigate	CodeCode Available	2
Causal Context Adjustment Loss for Learned Image Compression	Oct 7, 2024	Image Compression	CodeCode Available	2
TurtleBench: Evaluating Top Language Models via Real-World Yes/No Puzzles	Oct 7, 2024	Logical Reasoning	CodeCode Available	2
Next Best Sense: Guiding Vision and Touch with FisherRF for 3D Gaussian Splatting	Oct 7, 2024	3DGS	CodeCode Available	2
Towards Ultra-Low-Power Neuromorphic Speech Enhancement with Spiking-FullSubNet	Oct 7, 2024	DenoisingSpeech Denoising	CodeCode Available	2
TextHawk2: A Large Vision-Language Model Excels in Bilingual OCR and Grounding with 16x Fewer Tokens	Oct 7, 2024	Language ModelingLanguage Modelling	CodeCode Available	2
Mitigating Modality Prior-Induced Hallucinations in Multimodal Large Language Models via Deciphering Attention Causality	Oct 7, 2024	Causal Inferencecounterfactual	CodeCode Available	2
Differential Transformer	Oct 7, 2024	HallucinationIn-Context Learning	CodeCode Available	2
Knowledge-Guided Dynamic Modality Attention Fusion Framework for Multimodal Sentiment Analysis	Oct 6, 2024	Multimodal Sentiment AnalysisSentiment Analysis	CodeCode Available	2
Generative Flows on Synthetic Pathway for Drug Design	Oct 6, 2024	Drug DesignDrug Discovery	CodeCode Available	2
dattri: A Library for Efficient Data Attribution	Oct 6, 2024	Benchmarking	CodeCode Available	2
Reasoning-Enhanced Healthcare Predictions with Knowledge Graph Community Retrieval	Oct 6, 2024	Community DetectionInformation Retrieval	CodeCode Available	2
GenSim: A General Social Simulation Platform with Large Language Model based Agents	Oct 6, 2024	Language ModelingLanguage Modelling	CodeCode Available	2
TIS-DPO: Token-level Importance Sampling for Direct Preference Optimization With Estimated Weights	Oct 6, 2024		CodeCode Available	2
LiteVLoc: Map-Lite Visual Localization for Image Goal Navigation	Oct 6, 2024	Pose EstimationVisual Localization	CodeCode Available	2
DiffusionFake: Enhancing Generalization in Deepfake Detection via Guided Stable Diffusion	Oct 6, 2024	DeepFake DetectionDomain Generalization	CodeCode Available	2
TimeBridge: Non-Stationarity Matters for Long-term Time Series Forecasting	Oct 6, 2024	Multivariate Time Series ForecastingTime Series	CodeCode Available	2
UniMuMo: Unified Text, Music and Motion Generation	Oct 6, 2024	DecoderMotion Generation	CodeCode Available	2
Hammer: Robust Function-Calling for On-Device Language Models via Function Masking	Oct 6, 2024		CodeCode Available	2
Gödel Agent: A Self-Referential Agent Framework for Recursive Self-Improvement	Oct 6, 2024	Mathematical ReasoningMeta-Learning	CodeCode Available	2
Distillation-Free One-Step Diffusion for Real-World Image Super-Resolution	Oct 5, 2024	Image Super-ResolutionKnowledge Distillation	CodeCode Available	2
A Simple yet Effective Training-free Prompt-free Approach to Chinese Spelling Correction Based on Large Language Models	Oct 5, 2024	Language ModelingLanguage Modelling	CodeCode Available	2
An Electrocardiogram Foundation Model Built on over 10 Million Recordings with External Evaluation across Multiple Domains	Oct 5, 2024	DiagnosticEvent Detection	CodeCode Available	2
DeFoG: Discrete Flow Matching for Graph Generation	Oct 5, 2024	DenoisingGraph Generation	CodeCode Available	2
SyllableLM: Learning Coarse Semantic Units for Speech Language Models	Oct 5, 2024	ClusteringLanguage Modeling	CodeCode Available	2
Learning Truncated Causal History Model for Video Restoration	Oct 4, 2024	DeblurringDenoising	CodeCode Available	2
Look Twice Before You Answer: Memory-Space Visual Retracing for Hallucination Mitigation in Multimodal Large Language Models	Oct 4, 2024	DecoderHallucination	CodeCode Available	2
Oscillatory State-Space Models	Oct 4, 2024	MambaState Space Models	CodeCode Available	2
Refinement of Monocular Depth Maps via Multi-View Differentiable Rendering	Oct 4, 2024	Depth EstimationMonocular Depth Estimation	CodeCode Available	2
Mamba in Vision: A Comprehensive Survey of Techniques and Applications	Oct 4, 2024	MambaState Space Models	CodeCode Available	2
Multi-Robot Motion Planning with Diffusion Models	Oct 4, 2024	Motion Planning	CodeCode Available	2
Dynamic Diffusion Transformer	Oct 4, 2024	Image Generation	CodeCode Available	2
Exploring the Benefit of Activation Sparsity in Pre-training	Oct 4, 2024		CodeCode Available	2
ToolGen: Unified Tool Retrieval and Calling via Generation	Oct 4, 2024	RetrievalText Generation	CodeCode Available	2
Learning from Committee: Reasoning Distillation from a Mixture of Teachers with Peer-Review	Oct 4, 2024	Knowledge DistillationLogical Reasoning	CodeCode Available	2
Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models	Oct 4, 2024	Dense Video CaptioningSentence	CodeCode Available	2
Scaling Large Motion Models with Million-Level Human Motions	Oct 4, 2024	Motion Generation	CodeCode Available	2
Unraveling Cross-Modality Knowledge Conflicts in Large Vision-Language Models	Oct 4, 2024		CodeCode Available	2
Steering Large Language Models between Code Execution and Textual Reasoning	Oct 4, 2024	Code GenerationMath	CodeCode Available	2
Autoregressive Action Sequence Learning for Robotic Manipulation	Oct 4, 2024	ChunkingLanguage Modeling	CodeCode Available	2
MetricX-24: The Google Submission to the WMT 2024 Metrics Shared Task	Oct 4, 2024	Translation	CodeCode Available	2
Generative Artificial Intelligence for Navigating Synthesizable Chemical Space	Oct 4, 2024	Drug DiscoveryNavigate	CodeCode Available	2
GraphRouter: A Graph-based Router for LLM Selections	Oct 4, 2024	Transductive Learning	CodeCode Available	2
AutoPenBench: Benchmarking Generative Agents for Penetration Testing	Oct 4, 2024	Benchmarking	CodeCode Available	2