The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 17501–17550 of 474278 papers

Title	Date	Tasks	Status	Hype
Algebraic Machine Learning: Learning as computing an algebraic decomposition of a task	Feb 27, 2025	Abstract Algebra	CodeCode Available	1
CLIP Under the Microscope: A Fine-Grained Analysis of Multi-Object Representation	Feb 27, 2025	Image-text matchingObject	CodeCode Available	1
Shifting the Paradigm: A Diffeomorphism Between Time Series Data Manifolds for Achieving Shift-Invariancy in Deep Learning	Feb 27, 2025	Time Series	CodeCode Available	1
Your contrastive learning problem is secretly a distribution alignment problem	Feb 27, 2025	Contrastive LearningSelf-Supervised Learning	CodeCode Available	1
Complex LLM Planning via Automated Heuristics Discovery	Feb 26, 2025		CodeCode Available	1
HDEE: Heterogeneous Domain Expert Ensemble	Feb 26, 2025		CodeCode Available	1
Know You First and Be You Better: Modeling Human-Like User Simulators via Implicit Profiles	Feb 26, 2025		CodeCode Available	1
FSPO: Few-Shot Preference Optimization of Synthetic Preference Data in LLMs Elicits Effective Personalization to Real Users	Feb 26, 2025	In-Context LearningMeta-Learning	CodeCode Available	1
Exploring Graph Tasks with Pure LLMs: A Comprehensive Benchmark and Investigation	Feb 26, 2025	BenchmarkingGraph Learning	CodeCode Available	1
REALM-Bench: A Real-World Planning Benchmark for LLMs and Multi-Agent Systems	Feb 26, 2025		CodeCode Available	1
Sparklen: A Statistical Learning Toolkit for High-Dimensional Hawkes Processes in Python	Feb 26, 2025		CodeCode Available	1
Distilling Reinforcement Learning Algorithms for In-Context Model-Based Planning	Feb 26, 2025	In-Context Reinforcement LearningReinforcement Learning (RL)	CodeCode Available	1
Can Language Models Falsify? Evaluating Algorithmic Reasoning with Counterexample Creation	Feb 26, 2025	Ingenuityscientific discovery	CodeCode Available	1
SE(3)-Equivariant Ternary Complex Prediction Towards Target Protein Degradation	Feb 26, 2025	Blind DockingDecoder	CodeCode Available	1
Generalizable deep learning for photoplethysmography-based blood pressure estimation -- A Benchmarking Study	Feb 26, 2025	BenchmarkingBlood pressure estimation	CodeCode Available	1
A Sliding Layer Merging Method for Efficient Depth-Wise Pruning in LLMs	Feb 26, 2025		CodeCode Available	1
OS-Kairos: Adaptive Interaction for MLLM-Powered GUI Agents	Feb 26, 2025	Human Agent Collaboration	CodeCode Available	1
Ev-3DOD: Pushing the Temporal Boundaries of 3D Object Detection with Event Cameras	Feb 26, 2025	3D Object DetectionAutonomous Driving	CodeCode Available	1
AKDT: Adaptive Kernel Dilation Transformer for Effective Image Denoising	Feb 26, 2025	Color Image DenoisingDenoising	CodeCode Available	1
TrajLLM: A Modular LLM-Enhanced Agent-Based Framework for Realistic Human Trajectory Simulation	Feb 26, 2025	Management	CodeCode Available	1
Does 3D Gaussian Splatting Need Accurate Volumetric Rendering?	Feb 26, 2025	3DGSNeRF	CodeCode Available	1
CAMEx: Curvature-aware Merging of Experts	Feb 26, 2025		CodeCode Available	1
Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning?	Feb 26, 2025	Math	CodeCode Available	1
Poster: Long PHP webshell files detection based on sliding window attention	Feb 26, 2025		CodeCode Available	1
Evaluating Intelligence via Trial and Error	Feb 26, 2025		CodeCode Available	1
EndoMamba: An Efficient Foundation Model for Endoscopic Videos via Hierarchical Pre-training	Feb 26, 2025	MambaRepresentation Learning	CodeCode Available	1
Reward Shaping to Mitigate Reward Hacking in RLHF	Feb 26, 2025		CodeCode Available	1
CodeIF: Benchmarking the Instruction-Following Capabilities of Large Language Models for Code Generation	Feb 26, 2025	BenchmarkingCode Generation	CodeCode Available	1
Marking Code Without Breaking It: Code Watermarking for Detecting LLM-Generated Code	Feb 26, 2025		CodeCode Available	1
Starjob: Dataset for LLM-Driven Job Shop Scheduling	Feb 26, 2025	Combinatorial OptimizationJob Shop Scheduling	CodeCode Available	1
UQABench: Evaluating User Embedding for Prompting LLMs in Personalized Question Answering	Feb 26, 2025	Question Answering	CodeCode Available	1
CryptoPulse: Short-Term Cryptocurrency Forecasting with Dual-Prediction and Cross-Correlated Market Indicators	Feb 26, 2025	Decision Making	CodeCode Available	1
VEM: Environment-Free Exploration for Training GUI Agent with Value Environment Model	Feb 26, 2025	Reinforcement Learning (RL)	CodeCode Available	1
FACT-AUDIT: An Adaptive Multi-Agent Framework for Dynamic Fact-Checking Evaluation of Large Language Models	Feb 25, 2025	Fact Checking	CodeCode Available	1
LLM Knows Geometry Better than Algebra: Numerical Understanding of LLM-Based Agents in A Trading Arena	Feb 25, 2025		CodeCode Available	1
Escaping The Big Data Paradigm in Self-Supervised Representation Learning	Feb 25, 2025	Representation Learning	CodeCode Available	1
Multi-Perspective Data Augmentation for Few-shot Object Detection	Feb 25, 2025	Data AugmentationFew-Shot Object Detection	CodeCode Available	1
Task Graph Maximum Likelihood Estimation for Procedural Activity Understanding in Egocentric Videos	Feb 25, 2025	Graph LearningMistake Detection	CodeCode Available	1
Transfer Learning Assisted Fast Design Migration Over Technology Nodes: A Study on Transformer Matching Network	Feb 25, 2025	Transfer Learning	CodeCode Available	1
Learning Structure-Supporting Dependencies via Keypoint Interactive Transformer for General Mammal Pose Estimation	Feb 25, 2025	Animal Pose EstimationPose Estimation	CodeCode Available	1
Unveiling the Key Factors for Distilling Chain-of-Thought Reasoning	Feb 25, 2025		CodeCode Available	1
MM-PoisonRAG: Disrupting Multimodal RAG with Local and Global Poisoning Attacks	Feb 25, 2025	MisinformationQuestion Answering	CodeCode Available	1
Inverse Materials Design by Large Language Model-Assisted Generative Framework	Feb 25, 2025	DiversityLanguage Modeling	CodeCode Available	1
Can Multimodal LLMs Perform Time Series Anomaly Detection?	Feb 25, 2025	Anomaly DetectionIrregular Time Series	CodeCode Available	1
Problem Solved? Information Extraction Design Space for Layout-Rich Documents using LLMs	Feb 25, 2025	BenchmarkingChunking	CodeCode Available	1
Steering Language Model to Stable Speech Emotion Recognition via Contextual Perception and Chain of Thought	Feb 25, 2025	Emotion RecognitionLanguage Modeling	CodeCode Available	1
Towards Enhanced Immersion and Agency for LLM-based Interactive Drama	Feb 25, 2025		CodeCode Available	1
MRBTP: Efficient Multi-Robot Behavior Tree Planning and Collaboration	Feb 25, 2025	Robot Task PlanningTask Planning	CodeCode Available	1
Training Consistency Models with Variational Noise Coupling	Feb 25, 2025	Image Generation	CodeCode Available	1
Guiding not Forcing: Enhancing the Transferability of Jailbreaking Attacks on LLMs via Removing Superfluous Constraints	Feb 25, 2025		CodeCode Available	1