The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 20301–20350 of 474278 papers

Title	Date	Tasks	Status	Hype
Steering Your Generalists: Improving Robotic Foundation Models via Value Guidance	Oct 17, 2024	Offline RLRe-Ranking	CodeCode Available	1
Mitigating Hallucinations in Large Vision-Language Models via Summary-Guided Decoding	Oct 17, 2024	HallucinationObject Hallucination	CodeCode Available	1
Interpreting Temporal Graph Neural Networks with Koopman Theory	Oct 17, 2024	Dimensionality ReductionEpidemiology	CodeCode Available	1
EH-MAM: Easy-to-Hard Masked Acoustic Modeling for Self-Supervised Speech Representation Learning	Oct 17, 2024	Representation LearningSelf-Supervised Learning	CodeCode Available	1
ControlAgent: Automating Control System Design via Novel Integration of LLM Agents and Domain Expertise	Oct 17, 2024	Specificity	CodeCode Available	1
Unlocking the Capabilities of Masked Generative Models for Image Synthesis via Self-Guidance	Oct 17, 2024	DiversityImage Generation	CodeCode Available	1
LESS: Label-Efficient and Single-Stage Referring 3D Segmentation	Oct 17, 2024	cross-modal alignmentInstance Segmentation	CodeCode Available	1
Starbucks: Improved Training for 2D Matryoshka Embeddings	Oct 17, 2024	Language Modellingtext similarity	CodeCode Available	1
Router-Tuning: A Simple and Effective Approach for Enabling Dynamic-Depth in Transformers	Oct 17, 2024		CodeCode Available	1
Reward-free World Models for Online Imitation Learning	Oct 17, 2024	Imitation LearningQ-Learning	CodeCode Available	1
MIRAGE-Bench: Automatic Multilingual Benchmark Arena for Retrieval-Augmented Generation Systems	Oct 17, 2024	Answer GenerationLanguage Modeling	CodeCode Available	1
Hybrid bundle-adjusting 3D Gaussians for view consistent rendering with pose optimization	Oct 17, 2024	Novel View Synthesis	CodeCode Available	1
TCP-Diffusion: A Multi-modal Diffusion Model for Global Tropical Cyclone Precipitation Forecasting with Change Awareness	Oct 17, 2024	Precipitation Forecasting	CodeCode Available	1
RAMPA: Robotic Augmented Reality for Machine Programming by DemonstrAtion	Oct 17, 2024		CodeCode Available	1
MobA: Multifaceted Memory-Enhanced Adaptive Planning for Efficient Mobile Task Automation	Oct 17, 2024	Decision MakingLanguage Modeling	CodeCode Available	1
SiamSeg: Self-Training with Contrastive Learning for Unsupervised Domain Adaptation Semantic Segmentation in Remote Sensing	Oct 17, 2024	Contrastive LearningDiversity	CodeCode Available	1
FaithBench: A Diverse Hallucination Benchmark for Summarization by Modern LLMs	Oct 17, 2024	DiversityHallucination	CodeCode Available	1
PORTAL: Scalable Tabular Foundation Models via Content-Specific Tokenization	Oct 17, 2024	Self-Supervised Learning	CodeCode Available	1
DN-4DGS: Denoised Deformable Network with Temporal-Spatial Aggregation for Dynamic Scene Rendering	Oct 17, 2024	3DGSNeRF	CodeCode Available	1
FIRE: Fact-checking with Iterative Retrieval and Verification	Oct 17, 2024	Claim VerificationFact Checking	CodeCode Available	1
Diffusing States and Matching Scores: A New Framework for Imitation Learning	Oct 17, 2024	continuous-controlContinuous Control	CodeCode Available	1
EP-SAM: Weakly Supervised Histopathology Segmentation via Enhanced Prompt with Segment Anything	Oct 17, 2024	DiagnosticGPU	CodeCode Available	1
Benchmarking Transcriptomics Foundation Models for Perturbation Analysis : one PCA still rules them all	Oct 17, 2024	AllBenchmarking	CodeCode Available	1
Can MLLMs Understand the Deep Implication Behind Chinese Images?	Oct 17, 2024		CodeCode Available	1
Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs	Oct 17, 2024	Quantization	CodeCode Available	1
Learning Graph Quantized Tokenizers	Oct 17, 2024	Graph LearningQuantization	CodeCode Available	1
UniGS: Modeling Unitary 3D Gaussians for Novel View Synthesis from Sparse-view Images	Oct 17, 2024	3D ReconstructionDecoder	CodeCode Available	1
A Simulation System Towards Solving Societal-Scale Manipulation	Oct 17, 2024		CodeCode Available	1
Preference Diffusion for Recommendation	Oct 17, 2024	Recommendation SystemsSequential Recommendation	CodeCode Available	1
Looking Inward: Language Models Can Learn About Themselves by Introspection	Oct 17, 2024	Out-of-Distribution Generalization	CodeCode Available	1
Interpret and Control Dense Retrieval with Sparse Latent Features	Oct 17, 2024	Retrieval	CodeCode Available	1
Diffusion Curriculum: Synthetic-to-Real Generative Curriculum Learning via Image-Guided Diffusion	Oct 17, 2024	Data AugmentationImage Generation	CodeCode Available	1
Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation	Oct 17, 2024	Decision Making	CodeCode Available	1
Sliding Puzzles Gym: A Scalable Benchmark for State Representation in Visual Reinforcement Learning	Oct 17, 2024	Decision MakingReinforcement Learning (RL)	CodeCode Available	1
Interpreting and Analysing CLIP's Zero-Shot Image Classification via Mutual Knowledge	Oct 16, 2024	Classificationimage-classification	CodeCode Available	1
CREAM: Consistency Regularized Self-Rewarding Language Models	Oct 16, 2024	Language ModelingLanguage Modelling	CodeCode Available	1
Rethinking Token Reduction for State Space Models	Oct 16, 2024	MambaState Space Models	CodeCode Available	1
FragNet: A Graph Neural Network for Molecular Property Prediction with Four Levels of Interpretability	Oct 16, 2024	Drug DiscoveryGraph Neural Network	CodeCode Available	1
HumanEval-V: Evaluating Visual Understanding and Reasoning Abilities of Large Multimodal Models Through Coding Tasks	Oct 16, 2024	Code GenerationHumanEval	CodeCode Available	1
VividMed: Vision Language Model with Versatile Visual Grounding for Medicine	Oct 16, 2024	Language ModelingLanguage Modelling	CodeCode Available	1
LoRA Soups: Merging LoRAs for Practical Skill Composition Tasks	Oct 16, 2024	Mathparameter-efficient fine-tuning	CodeCode Available	1
HerO at AVeriTeC: The Herd of Open Large Language Models for Verifying Real-World Claims	Oct 16, 2024	Fact CheckingLanguage Modeling	CodeCode Available	1
Counterfactual Generative Modeling with Variational Causal Inference	Oct 16, 2024	Causal Inferencecounterfactual	CodeCode Available	1
Meta-Unlearning on Diffusion Models: Preventing Relearning Unlearned Concepts	Oct 16, 2024		CodeCode Available	1
In-vivo high-resolution χ-separation at 7T	Oct 16, 2024		CodeCode Available	1
Open Materials 2024 (OMat24) Inorganic Materials Dataset and Models	Oct 16, 2024	Denoising	CodeCode Available	1
Expand and Compress: Exploring Tuning Principles for Continual Spatio-Temporal Graph Forecasting	Oct 16, 2024	Graph Neural NetworkSpatio-Temporal Forecasting	CodeCode Available	1
Facilitating Multi-turn Function Calling for LLMs via Compositional Instruction Tuning	Oct 16, 2024	8k	CodeCode Available	1
Dual Prototype Evolving for Test-Time Generalization of Vision-Language Models	Oct 16, 2024	Computational EfficiencyTest-time Adaptation	CodeCode Available	1
Revealing the Barriers of Language Agents in Planning	Oct 16, 2024		CodeCode Available	1