The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 8901–8950 of 177340 papers

Title	Date	Tasks	Status	Hype	Score
RuleKit 2: Faster and simpler rule learning	Apr 29, 2025	Descriptive	CodeCode Available	2	5
Segment Anything for Histopathology	Feb 1, 2025	Image SegmentationInstance Segmentation	CodeCode Available	2	5
High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning	Jul 8, 2025	MMEReinforcement Learning (RL)	CodeCode Available	2	5
Seeing World Dynamics in a Nutshell	Feb 5, 2025	Video Reconstruction	CodeCode Available	2	5
Step Back to Leap Forward: Self-Backtracking for Boosting Reasoning of Language Models	Feb 6, 2025		CodeCode Available	2	5
KET-RAG: A Cost-Efficient Multi-Granular Indexing Framework for Graph-RAG	Feb 13, 2025	Knowledge GraphsLarge Language Model	CodeCode Available	2	5
Training Turn-by-Turn Verifiers for Dialogue Tutoring Agents: The Curious Case of LLMs as Your Coding Tutors	Feb 18, 2025	Code GenerationKnowledge Tracing	CodeCode Available	2	5
Re-Align: Aligning Vision Language Models via Retrieval-Augmented Direct Preference Optimization	Feb 18, 2025	Image RetrievalQuestion Answering	CodeCode Available	2	5
E2EC: An End-to-End Contour-based Method for High-Quality High-Speed Instance Segmentation	Mar 8, 2022	GPUInstance Segmentation	CodeCode Available	2	5
A Survey on Data Contamination for Large Language Models	Feb 20, 2025	SurveyText Generation	CodeCode Available	2	5
MaterialFusion: High-Quality, Zero-Shot, and Controllable Material Transfer with Diffusion Models	Feb 10, 2025		CodeCode Available	2	5
PIP-KAG: Mitigating Knowledge Conflicts in Knowledge-Augmented Generation via Parametric Pruning	Feb 21, 2025	Hallucination	CodeCode Available	2	5
voc2vec: A Foundation Model for Non-Verbal Vocalization	Feb 22, 2025	model	CodeCode Available	2	5
WebGames: Challenging General-Purpose Web-Browsing AI Agents	Feb 25, 2025		CodeCode Available	2	5
FlexVAR: Flexible Visual Autoregressive Modeling without Residual Prediction	Feb 27, 2025	Image GenerationPrediction	CodeCode Available	2	5
AgentSociety Challenge: Designing LLM Agents for User Modeling and Recommendation on Web Platforms	Feb 26, 2025	Language ModelingLanguage Modelling	CodeCode Available	2	5
Automatic database description generation for Text-to-SQL	Feb 28, 2025	Text to SQLText-To-SQL	CodeCode Available	2	5
UL-UNAS: Ultra-Lightweight U-Nets for Real-Time Speech Enhancement via Network Architecture Search	Mar 1, 2025	Neural Architecture SearchSpeech Enhancement	CodeCode Available	2	5
LongProLIP: A Probabilistic Vision-Language Model with Long Context Text	Mar 11, 2025	Language ModelingLanguage Modelling	CodeCode Available	2	5
An Approach for Air Drawing Using Background Subtraction and Contour Extraction	Mar 3, 2025	Hand DetectionOptical Character Recognition (OCR)	CodeCode Available	2	5
Interactive Debugging and Steering of Multi-Agent AI Systems	Mar 3, 2025	AI Agent	CodeCode Available	2	5
MPO: Boosting LLM Agents with Meta Plan Optimization	Mar 4, 2025		CodeCode Available	2	5
Text2LIVE: Text-Driven Layered Image and Video Editing	Apr 5, 2022	Video Editing	CodeCode Available	2	5
Similarity-Guided Layer-Adaptive Vision Transformer for UAV Tracking	Mar 9, 2025	Visual Tracking	CodeCode Available	2	5
GigaSLAM: Large-Scale Monocular SLAM with Hierarchical Gaussian Splats	Mar 11, 2025	3DGSNeRF	CodeCode Available	2	5
Is CLIP ideal? No. Can we fix it? Yes!	Mar 10, 2025	AttributeNegation	CodeCode Available	2	5
Word2World: Generating Stories and Worlds through Large Language Models	May 6, 2024	Game Design	CodeCode Available	2	5
LLM-FP4: 4-Bit Floating-Point Quantized Transformers	Oct 25, 2023	Common Sense ReasoningQuantization	CodeCode Available	2	5
OVTR: End-to-End Open-Vocabulary Multiple Object Tracking with Transformer	Mar 13, 2025	Decodermultimodal interaction	CodeCode Available	2	5
RouterEval: A Comprehensive Benchmark for Routing LLMs to Explore Model-level Scaling Up in LLMs	Mar 8, 2025	Instruction FollowingMathematical Reasoning	CodeCode Available	2	5
A Comprehensive Survey on Knowledge Distillation	Mar 15, 2025	Knowledge DistillationSurvey	CodeCode Available	2	5
TimberTrek: Exploring and Curating Sparse Decision Trees with Interactive Visualization	Sep 19, 2022		CodeCode Available	2	5
LeanVAE: An Ultra-Efficient Reconstruction VAE for Video Diffusion Models	Mar 18, 2025	compressed sensingVideo Generation	CodeCode Available	2	5
MambaIC: State Space Models for High-Performance Learned Image Compression	Mar 16, 2025	Image CompressionState Space Models	CodeCode Available	2	5
Single Image Iterative Subject-driven Generation and Editing	Mar 20, 2025	Image Generation	CodeCode Available	2	5
NuiScene: Exploring Efficient Generation of Unbounded Outdoor Scenes	Mar 20, 2025	Scene Generation	CodeCode Available	2	5
SaMam: Style-aware State Space Model for Arbitrary Image Style Transfer	Mar 20, 2025	DecoderMamba	CodeCode Available	2	5
Splat-LOAM: Gaussian Splatting LiDAR Odometry and Mapping	Mar 21, 2025	GPUMotion Estimation	CodeCode Available	2	5
Correcting Deviations from Normality: A Reformulated Diffusion Model for Multi-Class Unsupervised Anomaly Detection	Mar 25, 2025	Anomaly DetectionUnsupervised Anomaly Detection	CodeCode Available	2	5
Datasets for Depression Modeling in Social Media: An Overview	Mar 27, 2025		CodeCode Available	2	5
AutoEval: Autonomous Evaluation of Generalist Robot Manipulation Policies in the Real World	Mar 31, 2025	Robot ManipulationScheduling	CodeCode Available	2	5
On-device Sora: Enabling Training-Free Diffusion-based Text-to-Video Generation for Mobile Devices	Mar 31, 2025	DenoisingModel Optimization	CodeCode Available	2	5
Efficient Federated Learning Tiny Language Models for Mobile Network Feature Prediction	Apr 2, 2025	Federated Learning	CodeCode Available	2	5
An Illusion of Progress? Assessing the Current State of Web Agents	Apr 2, 2025		CodeCode Available	2	5
Re-thinking Temporal Search for Long-Form Video Understanding	Apr 3, 2025	Computational EfficiencyForm	CodeCode Available	2	5
A Decade of Deep Learning for Remote Sensing Spatiotemporal Fusion: Advances, Challenges, and Opportunities	Apr 1, 2025		CodeCode Available	2	5
Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal Prompting	Apr 7, 2025	Boundary DetectionObject	CodeCode Available	2	5
VocalNet: Speech LLM with Multi-Token Prediction for Faster and High-Quality Generation	Apr 5, 2025		CodeCode Available	2	5
Chain-of-Tools: Utilizing Massive Unseen Tools in the CoT Reasoning of Frozen Language Models	Mar 21, 2025	GSM8KQuestion Answering	CodeCode Available	2	5
LLM-SRBench: A New Benchmark for Scientific Equation Discovery with Large Language Models	Apr 14, 2025	Equation DiscoveryMemorization	CodeCode Available	2	5