The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 6151–6200 of 661570 papers

Title	Date	Tasks	Status	Hype
Fino1: On the Transferability of Reasoning Enhanced LLMs to Finance	Feb 12, 2025	BenchmarkingLong-Context Understanding	CodeCode Available	2
ViLa-MIL: Dual-scale Vision-Language Multiple Instance Learning for Whole Slide Image Classification	Feb 12, 2025	DecoderDescriptive	CodeCode Available	2
SARChat-Bench-2M: A Multi-Task Vision-Language Benchmark for SAR Image Interpretation	Feb 12, 2025	Earth Observationobject-detection	CodeCode Available	2
Human-Centric Foundation Models: Perception, Generation and Agentic Modeling	Feb 12, 2025	Survey	CodeCode Available	2
Cluster and Predict Latents Patches for Improved Masked Image Modeling	Feb 12, 2025	Representation Learning	CodeCode Available	2
Brain Latent Progression: Individual-based Spatiotemporal Disease Progression on 3D Brain MRIs via Latent Diffusion	Feb 12, 2025		CodeCode Available	2
The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks	Feb 12, 2025		CodeCode Available	2
LIR-LIVO: A Lightweight,Robust LiDAR/Vision/Inertial Odometry with Illumination-Resilient Deep Features	Feb 12, 2025	Pose EstimationVisual Odometry	CodeCode Available	2
TLOB: A Novel Transformer Model with Dual Attention for Price Trend Prediction with Limit Order Book Data	Feb 12, 2025		CodeCode Available	2
WorldGUI: An Interactive Benchmark for Desktop GUI Automation from Any Starting Point	Feb 12, 2025		CodeCode Available	2
A Systematic Review on the Evaluation of Large Language Models in Theory of Mind Tasks	Feb 12, 2025		CodeCode Available	2
mmE5: Improving Multimodal Multilingual Embeddings via High-quality Synthetic Data	Feb 12, 2025	cross-modal alignmentLarge Language Model	CodeCode Available	2
TextAtlas5M: A Large-scale Dataset for Dense Text Image Generation	Feb 11, 2025	Image Generation	CodeCode Available	2
LASP-2: Rethinking Sequence Parallelism for Linear Attention and Its Hybrid	Feb 11, 2025		CodeCode Available	2
Training Deep Learning Models with Norm-Constrained LMOs	Feb 11, 2025	Deep Learning	CodeCode Available	2
MeshSplats: Mesh-Based Rendering with Gaussian Splatting Initialization	Feb 11, 2025		CodeCode Available	2
Semi-Supervised Vision-Centric 3D Occupancy World Model for Autonomous Driving	Feb 11, 2025	AttributeAutonomous Driving	CodeCode Available	2
DPO-Shift: Shifting the Distribution of Direct Preference Optimization	Feb 11, 2025		CodeCode Available	2
Less is More: Masking Elements in Image Condition Features Avoids Content Leakages in Style Transfer Diffusion Models	Feb 11, 2025	Style Transfer	CodeCode Available	2
Automated Capability Discovery via Model Self-Exploration	Feb 11, 2025	model	CodeCode Available	2
RoboBERT: An End-to-end Multimodal Robotic Manipulation Model	Feb 11, 2025	Data Augmentation	CodeCode Available	2
SAMRefiner: Taming Segment Anything Model for Universal Mask Refinement	Feb 10, 2025	Semantic Segmentation	CodeCode Available	2
KARMA: Leveraging Multi-Agent LLMs for Automated Knowledge Graph Enrichment	Feb 10, 2025	ArticlesKnowledge Graphs	CodeCode Available	2
Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning	Feb 10, 2025	MathMathematical Reasoning	CodeCode Available	2
On the Emergence of Thinking in LLMs I: Searching for the Right Intuition	Feb 10, 2025	Math	CodeCode Available	2
TimeKAN: KAN-based Frequency Decomposition Learning Architecture for Long-term Time Series Forecasting	Feb 10, 2025	Representation LearningTime Series	CodeCode Available	2
MaterialFusion: High-Quality, Zero-Shot, and Controllable Material Transfer with Diffusion Models	Feb 10, 2025		CodeCode Available	2
Saving 77% of the Parameters in Large Language Models Technical Report	Feb 9, 2025	GPUText Generation	CodeCode Available	2
Skill Expansion and Composition in Parameter Space	Feb 9, 2025	D4RL	CodeCode Available	2
3CAD: A Large-Scale Real-World 3C Product Dataset for Unsupervised Anomaly	Feb 9, 2025	Anomaly DetectionUnsupervised Anomaly Detection	CodeCode Available	2
Differentially Private Synthetic Data via APIs 3: Using Simulators Instead of Foundation Model	Feb 8, 2025	Image Generation	CodeCode Available	2
Event Stream-based Visual Object Tracking: HDETrack V2 and A High-Definition Benchmark	Feb 8, 2025	Knowledge DistillationObject Tracking	CodeCode Available	2
CODESIM: Multi-Agent Code Generation and Problem Solving through Simulation-Driven Planning and Debugging	Feb 8, 2025	Code GenerationHumanEval	CodeCode Available	2
Knowledge Graph-Guided Retrieval Augmented Generation	Feb 8, 2025	DiversityHallucination	CodeCode Available	2
Towards Trustworthy Retrieval Augmented Generation for Large Language Models: A Survey	Feb 8, 2025	FairnessRAG	CodeCode Available	2
Adaptive Graph of Thoughts: Test-Time Adaptive Reasoning Unifying Chain, Tree, and Graph Structures	Feb 7, 2025	Mathematical Problem-Solvingreinforcement-learning	CodeCode Available	2
NoLiMa: Long-Context Evaluation Beyond Literal Matching	Feb 7, 2025		CodeCode Available	2
GaussRender: Learning 3D Occupancy with Gaussian Rendering	Feb 7, 2025	3D geometryAutonomous Vehicles	CodeCode Available	2
QuEST: Stable Training of LLMs with 1-Bit Weights and Activations	Feb 7, 2025	GPUQuantization	CodeCode Available	2
MHAF-YOLO: Multi-Branch Heterogeneous Auxiliary Fusion YOLO for accurate object detection	Feb 7, 2025	object-detectionObject Detection	CodeCode Available	2
GSM-Infinite: How Do Your LLMs Behave over Infinitely Increasing Context Length and Reasoning Complexity?	Feb 7, 2025	8kInformation Retrieval	CodeCode Available	2
SiriuS: Self-improving Multi-agent Systems via Bootstrapped Reasoning	Feb 7, 2025		CodeCode Available	2
Cross the Gap: Exposing the Intra-modal Misalignment in CLIP via Modality Inversion	Feb 6, 2025	image-classificationImage Classification	CodeCode Available	2
Step Back to Leap Forward: Self-Backtracking for Boosting Reasoning of Language Models	Feb 6, 2025		CodeCode Available	2
Training Language Models to Reason Efficiently	Feb 6, 2025	Reinforcement Learning (RL)	CodeCode Available	2
SoK: Benchmarking Poisoning Attacks and Defenses in Federated Learning	Feb 6, 2025	BenchmarkingData Poisoning	CodeCode Available	2
WaferLLM: Large Language Model Inference at Wafer Scale	Feb 6, 2025	GPULanguage Modeling	CodeCode Available	2
ScoreFlow: Mastering LLM Agent Workflows via Score-based Preference Optimization	Feb 6, 2025	Language ModelingLanguage Modelling	CodeCode Available	2
Sparse Autoencoders for Hypothesis Generation	Feb 5, 2025		CodeCode Available	2
On-device Sora: Enabling Training-Free Diffusion-based Text-to-Video Generation for Mobile Devices	Feb 5, 2025	DenoisingModel Optimization	CodeCode Available	2