The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 17751–17800 of 474278 papers

Title	Date	Tasks	Status	Hype
Thinking Preference Optimization	Feb 17, 2025	Math	CodeCode Available	1
M-ABSA: A Multilingual Dataset for Aspect-Based Sentiment Analysis	Feb 17, 2025	Aspect-Based Sentiment AnalysisAspect-Based Sentiment Analysis (ABSA)	CodeCode Available	1
Learning Dexterous Bimanual Catch Skills through Adversarial-Cooperative Heterogeneous-Agent Reinforcement Learning	Feb 17, 2025		CodeCode Available	1
Model Generalization on Text Attribute Graphs: Principles with Large Language Models	Feb 17, 2025	AttributeGraph Learning	CodeCode Available	1
Can LLM Watermarks Robustly Prevent Unauthorized Knowledge Distillation?	Feb 17, 2025	Knowledge DistillationLanguage Modeling	CodeCode Available	1
Table-Critic: A Multi-Agent Framework for Collaborative Criticism and Refinement in Table Reasoning	Feb 17, 2025	Computational Efficiency	CodeCode Available	1
Event-based Solutions for Human-centered Applications: A Comprehensive Review	Feb 17, 2025	Survey	CodeCode Available	1
DLFR-VAE: Dynamic Latent Frame Rate VAE for Video Generation	Feb 17, 2025	Video Generation	CodeCode Available	1
AdaSplash: Adaptive Sparse Flash Attention	Feb 17, 2025	GPULanguage Modeling	CodeCode Available	1
Revisiting the Test-Time Scaling of o1-like Models: Do they Truly Possess Test-Time Scaling Capabilities?	Feb 17, 2025		CodeCode Available	1
Range and Bird's Eye View Fused Cross-Modal Visual Place Recognition	Feb 17, 2025	Re-RankingTriplet	CodeCode Available	1
LaM-SLidE: Latent Space Modeling of Spatial Dynamical Systems via Linked Entities	Feb 17, 2025	Decoder	CodeCode Available	1
A Novel Unified Parametric Assumption for Nonconvex Optimization	Feb 17, 2025	Stochastic Optimization	CodeCode Available	1
Code-Vision: Evaluating Multimodal LLMs Logic Understanding and Code Generation Capabilities	Feb 17, 2025	Code GenerationHumanEval	CodeCode Available	1
Logic.py: Bridging the Gap between LLMs and Constraint Solvers	Feb 17, 2025	Language ModelingLanguage Modelling	CodeCode Available	1
Uncovering the Impact of Chain-of-Thought Reasoning for Direct Preference Optimization: Lessons from Text-to-SQL	Feb 17, 2025	Code GenerationMath	CodeCode Available	1
Understanding In-Context Machine Translation for Low-Resource Languages: A Case Study on Manchu	Feb 17, 2025	Data AugmentationIn-Context Learning	CodeCode Available	1
VRoPE: Rotary Position Embedding for Video Large Language Models	Feb 17, 2025	PositionVideo Understanding	CodeCode Available	1
SMART: Self-Aware Agent for Tool Overuse Mitigation	Feb 17, 2025	GSM8KLarge Language Model	CodeCode Available	1
Towards Mechanistic Interpretability of Graph Transformers via Attention Graphs	Feb 17, 2025	Node Classification	CodeCode Available	1
Deep Learning of Proteins with Local and Global Regions of Disorder	Feb 17, 2025	Protein Structure Prediction	CodeCode Available	1
APB: Accelerating Distributed Long-Context Inference by Passing Compressed Context Blocks across GPUs	Feb 17, 2025	Language ModelingLanguage Modelling	CodeCode Available	1
Following the Autoregressive Nature of LLM Embeddings via Compression and Alignment	Feb 17, 2025	Contrastive Learning	CodeCode Available	1
Market-Derived Financial Sentiment Analysis: Context-Aware Language Models for Crypto Forecasting	Feb 17, 2025	Financial Tweet PredictionLanguage Modeling	CodeCode Available	1
Learning to Sample Effective and Diverse Prompts for Text-to-Image Generation	Feb 17, 2025	Image GenerationReinforcement Learning (RL)	CodeCode Available	1
A Physics-Informed Blur Learning Framework for Imaging Systems	Feb 17, 2025	Deblurring	CodeCode Available	1
VANPY: Voice Analysis Framework	Feb 17, 2025	Action DetectionActivity Detection	CodeCode Available	1
Causal Inference for Qualitative Outcomes	Feb 17, 2025	Causal Inference	CodeCode Available	1
Small Models Struggle to Learn from Strong Reasoners	Feb 17, 2025		CodeCode Available	1
ILIAS: Instance-Level Image retrieval At Scale	Feb 17, 2025	BenchmarkingImage Retrieval	CodeCode Available	1
Masked Latent Prediction and Classification for Self-Supervised Audio Representation Learning	Feb 17, 2025	Audio ClassificationAudio Tagging	CodeCode Available	1
VLM2-Bench: A Closer Look at How Well VLMs Implicitly Link Explicit Matching Visual Cues	Feb 17, 2025		CodeCode Available	1
HintsOfTruth: A Multimodal Checkworthiness Detection Dataset with Real and Synthetic Claims	Feb 17, 2025	BenchmarkingFact Checking	CodeCode Available	1
ECG-Expert-QA: A Benchmark for Evaluating Medical Large Language Models in Heart Disease Diagnosis	Feb 16, 2025	DiagnosticRhythm	CodeCode Available	1
VLMs as GeoGuessr Masters: Exceptional Performance, Hidden Biases, and Privacy Risks	Feb 16, 2025		CodeCode Available	1
Exposing Numeracy Gaps: A Benchmark to Evaluate Fundamental Numerical Abilities in Large Language Models	Feb 16, 2025	Language ModelingLanguage Modelling	CodeCode Available	1
DA-Mamba: Domain Adaptive Hybrid Mamba-Transformer Based One-Stage Object Detection	Feb 16, 2025	Domain AdaptationKnowledge Distillation	CodeCode Available	1
Knowing Your Target: Target-Aware Transformer Makes Better Spatio-Temporal Video Grounding	Feb 16, 2025	AttributeObject	CodeCode Available	1
Reasoning-Augmented Conversation for Multi-Turn Jailbreak Attacks on Large Language Models	Feb 16, 2025	Safety Alignment	CodeCode Available	1
Dyve: Thinking Fast and Slow for Dynamic Process Verification	Feb 16, 2025	Math	CodeCode Available	1
Learning to Reason from Feedback at Test-Time	Feb 16, 2025		CodeCode Available	1
DuplexMamba: Enhancing Real-time Speech Conversations with Duplex and Streaming Capabilities	Feb 16, 2025	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	CodeCode Available	1
Enhancing Cross-Tokenizer Knowledge Distillation with Contextual Dynamical Mapping	Feb 16, 2025	Code GenerationInstruction Following	CodeCode Available	1
Knowledge Graph-Driven Retrieval-Augmented Generation: Integrating Deepseek-R1 with Weaviate for Advanced Chatbot Applications	Feb 16, 2025	ChatbotLanguage Modeling	CodeCode Available	1
ControlText: Unlocking Controllable Fonts in Multilingual Text Rendering without Font Annotations	Feb 16, 2025	Text Segmentation	CodeCode Available	1
SafeDialBench: A Fine-Grained Safety Benchmark for Large Language Models in Multi-Turn Dialogues with Diverse Jailbreak Attacks	Feb 16, 2025	Dialogue Generation	CodeCode Available	1
MMUnlearner: Reformulating Multimodal Machine Unlearning in the Era of Multimodal Large Language Models	Feb 16, 2025	Language ModelingLanguage Modelling	CodeCode Available	1
ReLearn: Unlearning via Learning for Large Language Models	Feb 16, 2025	Data AugmentationText Generation	CodeCode Available	1
The Mirage of Model Editing: Revisiting Evaluation in the Wild	Feb 16, 2025	Model EditingQuestion Answering	CodeCode Available	1
GRIFFIN: Effective Token Alignment for Faster Speculative Decoding	Feb 16, 2025		CodeCode Available	1