The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 4151–4200 of 661570 papers

Title	Date	Tasks	Status	Hype
Putting the Object Back into Video Object Segmentation	Oct 19, 2023	ObjectSegmentation	CodeCode Available	3
AgentTuning: Enabling Generalized Agent Abilities for LLMs	Oct 19, 2023	Memorization	CodeCode Available	3
Take the aTrain. Introducing an Interface for the Accessible Transcription of Interviews	Oct 18, 2023	CPUGPU	CodeCode Available	3
Llemma: An Open Language Model For Mathematics	Oct 16, 2023	Arithmetic ReasoningAutomated Theorem Proving	CodeCode Available	3
MotionDirector: Motion Customization of Text-to-Video Diffusion Models	Oct 12, 2023		CodeCode Available	3
Lag-Llama: Towards Foundation Models for Probabilistic Time Series Forecasting	Oct 12, 2023	DecoderProbabilistic Time Series Forecasting	CodeCode Available	3
Waymax: An Accelerated, Data-Driven Simulator for Large-Scale Autonomous Driving Research	Oct 12, 2023	Autonomous Driving	CodeCode Available	3
NoMaD: Goal Masked Diffusion Policies for Navigation and Exploration	Oct 11, 2023		CodeCode Available	3
CRITERIA: a New Benchmarking Paradigm for Evaluating Trajectory Prediction Models for Autonomous Driving	Oct 11, 2023	Autonomous DrivingBenchmarking	CodeCode Available	3
MetaAgents: Simulating Interactions of Human Behaviors for LLM-based Task-oriented Coordination via Collaborative Generative Agents	Oct 10, 2023		CodeCode Available	3
Text Embeddings Reveal (Almost) As Much As Text	Oct 10, 2023		CodeCode Available	3
Exploring Progress in Multivariate Time Series Forecasting: Comprehensive Benchmarking and Heterogeneity Analysis	Oct 9, 2023	BenchmarkingMultivariate Time Series Forecasting	CodeCode Available	3
How Abilities in Large Language Models are Affected by Supervised Fine-tuning Data Composition	Oct 9, 2023	Code GenerationInstruction Following	CodeCode Available	3
Evaluating Hallucinations in Chinese Large Language Models	Oct 5, 2023	HallucinationQuestion Answering	CodeCode Available	3
T^3Bench: Benchmarking Current Progress in Text-to-3D Generation	Oct 4, 2023	3D GenerationBenchmarking	CodeCode Available	3
MagicDrive: Street View Generation with Diverse 3D Geometry Control	Oct 4, 2023	3D geometry3D Object Detection	CodeCode Available	3
Conceptual Framework for Autonomous Cognitive Entities	Oct 3, 2023		CodeCode Available	3
OceanGPT: A Large Language Model for Ocean Science Tasks	Oct 3, 2023	Language ModelingLanguage Modelling	CodeCode Available	3
UltraFeedback: Boosting Language Models with Scaled AI Feedback	Oct 2, 2023	Language Modelling	CodeCode Available	3
AutoAgents: A Framework for Automatic Agent Generation	Sep 29, 2023		CodeCode Available	3
ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving	Sep 29, 2023	Arithmetic ReasoningComputational Efficiency	CodeCode Available	3
Data Filtering Networks	Sep 29, 2023	Language ModelingLanguage Modelling	CodeCode Available	3
SMPLer-X: Scaling Up Expressive Human Pose and Shape Estimation	Sep 29, 2023	3D Human Pose Estimation3D Human Reconstruction	CodeCode Available	3
Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation	Sep 27, 2023	GPUText-to-Video Generation	CodeCode Available	3
Deformable 3D Gaussians for High-Fidelity Monocular Dynamic Scene Reconstruction	Sep 22, 2023	Dynamic ReconstructionNeural Rendering	CodeCode Available	3
Leveraging In-the-Wild Data for Effective Self-Supervised Pretraining in Speaker Recognition	Sep 21, 2023	Speaker Recognition	CodeCode Available	3
Impact of architecture on robustness and interpretability of multispectral deep neural networks	Sep 21, 2023	Deep Learning	CodeCode Available	3
BTLM-3B-8K: 7B Parameter Performance in a 3B Parameter Model	Sep 20, 2023	8kLanguage Modeling	CodeCode Available	3
FreeU: Free Lunch in Diffusion U-Net	Sep 20, 2023	DecoderDenoising	CodeCode Available	3
SlimPajama-DC: Understanding Data Combinations for LLM Training	Sep 19, 2023		CodeCode Available	3
Amplifying Pathological Detection in EEG Signaling Pathways through Cross-Dataset Transfer Learning	Sep 19, 2023	EEGNMT	CodeCode Available	3
Multimodal Foundation Models: From Specialists to General-Purpose Assistants	Sep 18, 2023	Image GenerationSurvey	CodeCode Available	3
Sparse Autoencoders Find Highly Interpretable Features in Language Models	Sep 15, 2023	counterfactualLanguage Modelling	CodeCode Available	3
AudioSR: Versatile Audio Super-resolution at Scale	Sep 13, 2023	Audio Super-ResolutionSuper-Resolution	CodeCode Available	3
InstaFlow: One Step is Enough for High-Quality Diffusion-Based Text-to-Image Generation	Sep 12, 2023	GPUImage Generation	CodeCode Available	3
HAT: Hybrid Attention Transformer for Image Restoration	Sep 11, 2023	DenoisingImage Compression	CodeCode Available	3
Anatomy-informed Data Augmentation for Enhanced Prostate Cancer Detection	Sep 7, 2023	AnatomyData Augmentation	CodeCode Available	3
Tracking Anything with Decoupled Video Segmentation	Sep 7, 2023	Open-Vocabulary Video SegmentationOpen-World Video Segmentation	CodeCode Available	3
Matcha-TTS: A fast TTS architecture with conditional flow matching	Sep 6, 2023	Acoustic ModellingDecoder	CodeCode Available	3
nanoT5: A PyTorch Framework for Pre-training and Fine-tuning T5-style Models with Limited Resources	Sep 5, 2023	DecoderGPU	CodeCode Available	3
Generative Data Augmentation using LLMs improves Distributional Robustness in Question Answering	Sep 3, 2023	Data AugmentationDomain Adaptation	CodeCode Available	3
Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models	Sep 3, 2023	HallucinationWorld Knowledge	CodeCode Available	3
SAM-Med2D	Aug 30, 2023	DecoderImage Segmentation	CodeCode Available	3
Emergence of Segmentation with Minimalistic White-Box Transformers	Aug 30, 2023	SegmentationSelf-Supervised Learning	CodeCode Available	3
AnomalyGPT: Detecting Industrial Anomalies Using Large Vision-Language Models	Aug 29, 2023	Anomaly DetectionIn-Context Learning	CodeCode Available	3
LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding	Aug 28, 2023	16kCode Completion	CodeCode Available	3
VideoCutLER: Surprisingly Simple Unsupervised Video Instance Segmentation	Aug 28, 2023	Instance SegmentationOptical Flow Estimation	CodeCode Available	3
Matbench Discovery -- A framework to evaluate machine learning crystal stability predictions	Aug 28, 2023	BenchmarkingFormation Energy	CodeCode Available	3
Pixel-Aware Stable Diffusion for Realistic Image Super-resolution and Personalized Stylization	Aug 28, 2023	Image EnhancementImage Generation	CodeCode Available	3
How to Evaluate the Generalization of Detection? A Benchmark for Comprehensive Open-Vocabulary Detection	Aug 25, 2023	Object Detection	CodeCode Available	3