The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 17701–17750 of 474278 papers

Title	Date	Tasks	Status	Hype
Improving Data Efficiency for LLM Reinforcement Fine-tuning Through Difficulty-targeted Online Data Selection and Rollout Replay	Jun 5, 2025	Reinforcement Learning (RL)	CodeCode Available	1
FEAT: Full-Dimensional Efficient Attention Transformer for Medical Video Generation	Jun 5, 2025	DenoisingVideo Generation	CodeCode Available	1
Cracking the Code: Enhancing Implicit Hate Speech Detection through Coding Classification	Jun 5, 2025	Hate Speech Detection	—Unverified	0
Through-the-Wall Radar Human Activity Recognition WITHOUT Using Neural Networks	Jun 5, 2025	Activity RecognitionHuman Activity Recognition	CodeCode Available	0
StatsMerging: Statistics-Guided Model Merging via Task-Specific Teacher Distillation	Jun 5, 2025	Knowledge Distillation	CodeCode Available	0
Rethinking Contrastive Learning in Session-based Recommendation	Jun 5, 2025	Contrastive LearningSelf-Supervised Learning	CodeCode Available	0
Selecting Demonstrations for Many-Shot In-Context Learning via Gradient Matching	Jun 5, 2025	In-Context Learning	CodeCode Available	0
MockConf: A Student Interpretation Dataset: Analysis, Word- and Span-level Alignment and Baselines	Jun 5, 2025		CodeCode Available	0
Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models	Jun 5, 2025	RerankingRetrieval	CodeCode Available	5
Flattery, Fluff, and Fog: Diagnosing and Mitigating Idiosyncratic Biases in Preference Models	Jun 5, 2025	counterfactualData Augmentation	CodeCode Available	0
Dissecting Long Reasoning Models: An Empirical Study	Jun 5, 2025	Reinforcement Learning (RL)	CodeCode Available	0
Composing Agents to Minimize Worst-case Risk	Jun 5, 2025	Fairness	CodeCode Available	0
Tuning the Right Foundation Models is What you Need for Partial Label Learning	Jun 5, 2025	Model SelectionPartial Label Learning	CodeCode Available	1
ProJo4D: Progressive Joint Optimization for Sparse-View Inverse Physics Estimation	Jun 5, 2025	3D ReconstructionNeRF	—Unverified	0
Single GPU Task Adaptation of Pathology Foundation Models for Whole Slide Image Analysis	Jun 5, 2025	GPUMulti-Label Classification	—Unverified	0
Do It Yourself: Learning Semantic Correspondence from Pseudo-Labels	Jun 5, 2025	Semantic correspondence	—Unverified	0
Beyond Accuracy: Dissecting Mathematical Reasoning for LLMs Under Reinforcement Learning	Jun 5, 2025	Mathematical ReasoningProblem Decomposition	—Unverified	0
A MISMATCHED Benchmark for Scientific Natural Language Inference	Jun 5, 2025	ArticlesNatural Language Inference	CodeCode Available	0
Grounding Beyond Detection: Enhancing Contextual Understanding in Embodied 3D Grounding	Jun 5, 2025		CodeCode Available	0
VideoMolmo: Spatio-Temporal Grounding Meets Pointing	Jun 5, 2025	Autonomous DrivingAutonomous Navigation	CodeCode Available	2
Flex-TravelPlanner: A Benchmark for Flexible Planning with Language Agents	Jun 5, 2025		CodeCode Available	0
Identifying Reliable Evaluation Metrics for Scientific Text Revision	Jun 5, 2025	Instruction Following	CodeCode Available	0
Joint Evaluation of Answer and Reasoning Consistency for Hallucination Detection in Large Reasoning Models	Jun 5, 2025	DiagnosticHallucination	CodeCode Available	1
HALoS: Hierarchical Asynchronous Local SGD over Slow Networks for Geo-Distributed Large Language Model Training	Jun 5, 2025	Language ModelingLanguage Modelling	CodeCode Available	0
Controlling Summarization Length Through EOS Token Weighting	Jun 5, 2025	DecoderText Generation	—Unverified	0
TALL -- A Trainable Architecture for Enhancing LLM Performance in Low-Resource Languages	Jun 5, 2025	Computational EfficiencyTranslation	—Unverified	0
Quantifying Cross-Modality Memorization in Vision-Language Models	Jun 5, 2025	Machine UnlearningMemorization	—Unverified	0
DSG-World: Learning a 3D Gaussian World Model from Dual State Videos	Jun 5, 2025	3D Reconstruction	—Unverified	0
Stable Vision Concept Transformers for Medical Diagnosis	Jun 5, 2025	Medical Diagnosis	—Unverified	0
MARBLE: Material Recomposition and Blending in CLIP-Space	Jun 5, 2025	AttributeDenoising	—Unverified	0
ProRefine: Inference-time Prompt Refinement with Textual Feedback	Jun 5, 2025	Mathematical Reasoning	—Unverified	0
UNO: Unlearning via Orthogonalization in Generative models	Jun 5, 2025		CodeCode Available	0
Micro-Act: Mitigate Knowledge Conflict in Question Answering via Actionable Self-Reasoning	Jun 5, 2025	Question AnsweringRAG	CodeCode Available	0
Debatable Intelligence: Benchmarking LLM Judges via Debate Speech Evaluation	Jun 5, 2025	Benchmarking	CodeCode Available	0
ViCocktail: Automated Multi-Modal Data Collection for Vietnamese Audio-Visual Speech Recognition	Jun 5, 2025	Audio-Visual Speech Recognitionspeech-recognition	—Unverified	0
Prompting LLMs: Length Control for Isometric Machine Translation	Jun 5, 2025	de-enMachine Translation	—Unverified	0
OpenAg: Democratizing Agricultural Intelligence	Jun 5, 2025	Knowledge GraphsTransfer Learning	—Unverified	0
Search Arena: Analyzing Search-Augmented LLMs	Jun 5, 2025	Fact Checking	CodeCode Available	2
BSBench: will your LLM find the largest prime number?	Jun 5, 2025	Benchmarking	CodeCode Available	0
Astraea: A GPU-Oriented Token-wise Acceleration Framework for Video Diffusion Transformers	Jun 5, 2025	GPUText-to-Video Generation	—Unverified	0
RaySt3R: Predicting Novel Depth Maps for Zero-Shot Object Completion	Jun 5, 2025	Novel View SynthesisObject	—Unverified	0
SAM-aware Test-time Adaptation for Universal Medical Image Segmentation	Jun 5, 2025	Image SegmentationMedical Image Segmentation	CodeCode Available	0
A Reasoning-Based Approach to Cryptic Crossword Clue Solving	Jun 5, 2025		CodeCode Available	0
FedAPM: Federated Learning via ADMM with Partial Model Personalization	Jun 5, 2025	Federated Learning	CodeCode Available	0
Predicting ICU In-Hospital Mortality Using Adaptive Transformer Layer Fusion	Jun 5, 2025		CodeCode Available	0
Evaluating Sparse Autoencoders: From Shallow Design to Matching Pursuit	Jun 5, 2025	Dictionary Learning	—Unverified	0
Contrastive Flow Matching	Jun 5, 2025		CodeCode Available	2
Look Before You Leap: A GUI-Critic-R1 Model for Pre-Operative Error Diagnosis in GUI Automation	Jun 5, 2025	Decision MakingMultimodal Reasoning	—Unverified	0
LSM-2: Learning from Incomplete Wearable Sensor Data	Jun 5, 2025	DiagnosticImputation	—Unverified	0
Simulating LLM-to-LLM Tutoring for Multilingual Math Feedback	Jun 5, 2025	Math	—Unverified	0