The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

659,983 papers248,104 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3451–3500 of 659983 papers

Title	Date	Tasks	Status	Hype
A Review of Prominent Paradigms for LLM-Based Agents: Tool Use (Including RAG), Planning, and Feedback Learning	Jun 9, 2024	Language ModelingLanguage Modelling	CodeCode Available	3
A Survey on Text-guided 3D Visual Grounding: Elements, Recent Advances, and Future Directions	Jun 9, 2024	3D visual groundingSurvey	CodeCode Available	3
GameBench: Evaluating Strategic Reasoning Abilities of LLM Agents	Jun 7, 2024	Natural Language Understanding	CodeCode Available	3
Probabilistic Weather Forecasting with Hierarchical Graph Neural Networks	Jun 7, 2024	graph constructionWeather Forecasting	CodeCode Available	3
CRAG -- Comprehensive RAG Benchmark	Jun 7, 2024	HallucinationLanguage Modelling	CodeCode Available	3
FedLLM-Bench: Realistic Benchmarks for Federated Learning of Large Language Models	Jun 7, 2024	Federated Learning	CodeCode Available	3
Multi-Head RAG: Solving Multi-Aspect Problems with LLMs	Jun 7, 2024	BenchmarkingDecoder	CodeCode Available	3
WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild	Jun 7, 2024	BenchmarkingChatbot	CodeCode Available	3
VISTA3D: Versatile Imaging SegmenTation and Annotation model for 3D Computed Tomography	Jun 7, 2024	Computed Tomography (CT)Image Segmentation	CodeCode Available	3
Are We Done with MMLU?	Jun 6, 2024	MMLUVirology	CodeCode Available	3
MLVU: Benchmarking Multi-task Long Video Understanding	Jun 6, 2024	BenchmarkingVideo Understanding	CodeCode Available	3
Physics3D: Learning Physical Properties of 3D Gaussians via Video Diffusion	Jun 6, 2024	3D Generation	CodeCode Available	3
VideoTetris: Towards Compositional Text-to-Video Generation	Jun 6, 2024	DenoisingText-to-Video Generation	CodeCode Available	3
Vision-LSTM: xLSTM as Generic Vision Backbone	Jun 6, 2024		CodeCode Available	3
Flash3D: Feed-Forward Generalisable 3D Scene Reconstruction from a Single Image	Jun 6, 2024	3D Scene ReconstructionDepth Estimation	CodeCode Available	3
Improving Alignment and Robustness with Circuit Breakers	Jun 6, 2024	Adversarial Robustness	CodeCode Available	3
Aesthetic Post-Training Diffusion Models from Generic Preferences with Step-by-step Preference Optimization	Jun 6, 2024	DenoisingImage Generation	CodeCode Available	3
FusionBench: A Comprehensive Benchmark of Deep Model Fusion	Jun 5, 2024	image-classificationImage Classification	CodeCode Available	3
Docs2KG: Unified Knowledge Graph Construction from Heterogeneous Documents Assisted by Large Language Models	Jun 5, 2024	Data Integrationgraph construction	CodeCode Available	3
Computation-Efficient Era: A Comprehensive Survey of State Space Models in Medical Image Analysis	Jun 5, 2024	MambaMedical Image Analysis	CodeCode Available	3
Open-YOLO 3D: Towards Fast and Accurate Open-Vocabulary 3D Instance Segmentation	Jun 4, 2024	2D Object Detection3D Instance Segmentation	CodeCode Available	3
FedMKT: Federated Mutual Knowledge Transfer for Large and Small Language Models	Jun 4, 2024	Text GenerationTransfer Learning	CodeCode Available	3
Description Boosting for Zero-Shot Entity and Relation Classification	Jun 4, 2024	RelationRelation Classification	CodeCode Available	3
Improved Modelling of Federated Datasets using Mixtures-of-Dirichlet-Multinomials	Jun 4, 2024	Federated Learning	CodeCode Available	3
DreamPhysics: Learning Physics-Based 3D Dynamics with Video Diffusion Priors	Jun 3, 2024		CodeCode Available	3
ControlSpeech: Towards Simultaneous and Independent Zero-shot Speaker Cloning and Zero-shot Language Style Control	Jun 3, 2024	Speech Synthesistext-to-speech	CodeCode Available	3
MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark	Jun 3, 2024	MMLUMulti-task Language Understanding	CodeCode Available	3
Proxy Denoising for Source-Free Domain Adaptation	Jun 3, 2024	DenoisingDomain Adaptation	CodeCode Available	3
AutoStudio: Crafting Consistent Subjects in Multi-turn Interactive Image Generation	Jun 3, 2024	Image Generation	CodeCode Available	3
Deciphering Oracle Bone Language with Diffusion Models	Jun 2, 2024	DeciphermentImage Generation	CodeCode Available	3
Reservoir History Matching of the Norne field with generative exotic priors and a coupled Mixture of Experts -- Physics Informed Neural Operator Forward Model	Jun 2, 2024	DenoisingMixture-of-Experts	CodeCode Available	3
Collaborative Novel Object Discovery and Box-Guided Cross-Modal Alignment for Open-Vocabulary 3D Object Detection	Jun 2, 2024	3D Object Detectioncross-modal alignment	CodeCode Available	3
Automatic Instruction Evolving for Large Language Models	Jun 2, 2024	GSM8KHumanEval	CodeCode Available	3
HOPE: A Reinforcement Learning-based Hybrid Policy Path Planner for Diverse Parking Scenarios	May 31, 2024	Autonomous Drivingreinforcement-learning	CodeCode Available	3
Neural Network Verification with Branch-and-Bound for General Nonlinearities	May 31, 2024		CodeCode Available	3
MeshXL: Neural Coordinate Field for Generative 3D Foundation Models	May 31, 2024	Language ModelingLanguage Modelling	CodeCode Available	3
Scalable Bayesian Learning with posteriors	May 31, 2024		CodeCode Available	3
GNN-RAG: Graph Neural Retrieval for Large Language Model Reasoning	May 30, 2024	Graph Question AnsweringKnowledge Graphs	CodeCode Available	3
MotionLLM: Understanding Human Behaviors from Human Motions and Videos	May 30, 2024		CodeCode Available	3
CV-VAE: A Compatible Video VAE for Latent Generative Video Models	May 30, 2024	Quantization	CodeCode Available	3
MotionFollower: Editing Video Motion via Lightweight Score-Guided Diffusion	May 30, 2024	DenoisingGPU	CodeCode Available	3
Sequence-Augmented SE(3)-Flow Matching For Conditional Protein Backbone Generation	May 30, 2024	DiversityDrug Design	CodeCode Available	3
Descriptive Image Quality Assessment in the Wild	May 29, 2024	DescriptiveImage Quality Assessment	CodeCode Available	3
HLOB -- Information Persistence and Structure in Limit Order Books	May 29, 2024	Deep Learning	CodeCode Available	3
Understanding and Minimising Outlier Features in Neural Network Training	May 29, 2024		CodeCode Available	3
Blind Image Restoration via Fast Diffusion Inversion	May 29, 2024	DeblurringImage Restoration	CodeCode Available	3
T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model with Mixed Reward Feedback	May 29, 2024	Video Generation	CodeCode Available	3
Artificial Intelligence Index Report 2024	May 29, 2024		CodeCode Available	3
Poseidon: Efficient Foundation Models for PDEs	May 29, 2024	Operator learning	CodeCode Available	3
ORLM: A Customizable Framework in Training Large Models for Automated Optimization Modeling	May 28, 2024	Prompt Engineering	CodeCode Available	3