The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 6526–6550 of 474278 papers

Title	Date	Tasks	Status	Hype
Agent-SafetyBench: Evaluating the Safety of LLM Agents	Dec 19, 2024		CodeCode Available	2
LeviTor: 3D Trajectory Oriented Image-to-Video Synthesis	Dec 19, 2024	Object	CodeCode Available	2
Preventing Local Pitfalls in Vector Quantization via Optimal Transport	Dec 19, 2024	Image ReconstructionQuantization	CodeCode Available	2
Learning charges and long-range interactions from energies and forces	Dec 19, 2024		CodeCode Available	2
FlowAR: Scale-wise Autoregressive Image Generation Meets Flow Matching	Dec 19, 2024	Image GenerationPrediction	CodeCode Available	2
Mesoscopic Insights: Orchestrating Multi-scale & Hybrid Architecture for Image Manipulation Localization	Dec 18, 2024	Image Manipulation	CodeCode Available	2
InstructSeg: Unifying Instructed Visual Segmentation with Multi-modal Large Language Models	Dec 18, 2024	Reasoning SegmentationSegmentation	CodeCode Available	2
RelationField: Relate Anything in Radiance Fields	Dec 18, 2024	3d scene graph generationGraph Generation	CodeCode Available	2
Nullu: Mitigating Object Hallucinations in Large Vision-Language Models via HalluSpace Projection	Dec 18, 2024		CodeCode Available	2
Alignment faking in large language models	Dec 18, 2024	Large Language Model	CodeCode Available	2
ChinaTravel: A Real-World Benchmark for Language Agents in Chinese Travel Planning	Dec 18, 2024		CodeCode Available	2
AnySat: One Earth Observation Model for Many Resolutions, Scales, and Modalities	Dec 18, 2024	Change DetectionDiversity	CodeCode Available	2
Large Language Model Enhanced Recommender Systems: A Survey	Dec 18, 2024	Language ModelingLanguage Modelling	CodeCode Available	2
Joint Perception and Prediction for Autonomous Driving: A Survey	Dec 18, 2024	Autonomous Drivingmotion prediction	CodeCode Available	2
Open Universal Arabic ASR Leaderboard	Dec 18, 2024	Benchmarking	CodeCode Available	2
A Survey on LLM Inference-Time Self-Improvement	Dec 18, 2024	Survey	CodeCode Available	2
Learnable Prompting SAM-induced Knowledge Distillation for Semi-supervised Medical Image Segmentation	Dec 18, 2024	Image SegmentationKnowledge Distillation	CodeCode Available	2
Modality-Independent Graph Neural Networks with Global Transformers for Multimodal Recommendation	Dec 18, 2024	Graph LearningMulti-modal Recommendation	CodeCode Available	2
Efficient Diffusion Transformer Policies with Mixture of Expert Denoisers for Multitask Learning	Dec 17, 2024	Denoising	CodeCode Available	2
ArchesWeather & ArchesWeatherGen: a deterministic and generative model for efficient ML weather forecasting	Dec 17, 2024	GPUWeather Forecasting	CodeCode Available	2
AIR-Bench: Automated Heterogeneous Information Retrieval Benchmark	Dec 17, 2024	Information RetrievalRetrieval	CodeCode Available	2
OmniEval: An Omnidirectional and Automatic RAG Evaluation Benchmark in Financial Domain	Dec 17, 2024	RAGRetrieval	CodeCode Available	2
Streaming Keyword Spotting Boosted by Cross-layer Discrimination Consistency	Dec 17, 2024	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	CodeCode Available	2
Guiding Generative Protein Language Models with Reinforcement Learning	Dec 17, 2024	Diversityreinforcement-learning	CodeCode Available	2
SafeAgentBench: A Benchmark for Safe Task Planning of Embodied LLM Agents	Dec 17, 2024	Task Planning	CodeCode Available	2