The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

659,984 papers248,105 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3401–3450 of 659984 papers

Title	Date	Tasks	Status	Hype
WebCanvas: Benchmarking Web Agents in Online Environments	Jun 18, 2024	AI AgentBenchmarking	CodeCode Available	3
Refusal in Language Models Is Mediated by a Single Direction	Jun 17, 2024	Instruction Following	CodeCode Available	3
HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation Model	Jun 17, 2024	Computational EfficiencyEarth Observation	CodeCode Available	3
Unveiling Encoder-Free Vision-Language Models	Jun 17, 2024	DecoderInductive Bias	CodeCode Available	3
GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages with Automated Crawling, Transcription and Refinement	Jun 17, 2024	speech-recognitionSpeech Recognition	CodeCode Available	3
DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models	Jun 17, 2024	Document ClassificationVisual Grounding	CodeCode Available	3
AvaTaR: Optimizing LLM Agents for Tool Usage via Contrastive Reasoning	Jun 17, 2024	Language ModelingLanguage Modelling	CodeCode Available	3
An Imitative Reinforcement Learning Framework for Autonomous Dogfight	Jun 17, 2024	Imitation Learningreinforcement-learning	CodeCode Available	3
GUI-World: A Video Benchmark and Dataset for Multimodal GUI-oriented Understanding	Jun 16, 2024		CodeCode Available	3
Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference	Jun 16, 2024		CodeCode Available	3
Step-level Value Preference Optimization for Mathematical Reasoning	Jun 16, 2024	Learning-To-RankMath	CodeCode Available	3
AutoHallusion: Automatic Generation of Hallucination Benchmarks for Vision-Language Models	Jun 16, 2024	HallucinationHallucination Evaluation	CodeCode Available	3
CBGBench: Fill in the Blank of Protein-Molecule Complex Binding Graph	Jun 16, 2024	Drug DesignFairness	CodeCode Available	3
AgileCoder: Dynamic Collaborative Agents for Software Development based on Agile Methodology	Jun 16, 2024	Code Generation	CodeCode Available	3
IMDL-BenCo: A Comprehensive Benchmark and Codebase for Image Manipulation Detection & Localization	Jun 15, 2024	GPUImage Manipulation	CodeCode Available	3
TGB 2.0: A Benchmark for Learning on Temporal Knowledge Graphs and Heterogeneous Graphs	Jun 14, 2024	BenchmarkingKnowledge Graphs	CodeCode Available	3
DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning	Jun 14, 2024	Offline RL	CodeCode Available	3
CarLLaVA: Vision language models for camera-only closed-loop driving	Jun 14, 2024	Autonomous DrivingBench2Drive	CodeCode Available	3
Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation	Jun 14, 2024	Audio-Visual Speech RecognitionAutomatic Speech Recognition (ASR)	CodeCode Available	3
VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding	Jun 13, 2024	Dense Video CaptioningMVBench	CodeCode Available	3
Dispelling the Mirage of Progress in Offline MARL through Standardised Baselines and Evaluation	Jun 13, 2024	Multi-agent Reinforcement Learning	CodeCode Available	3
DrivAerNet++: A Large-Scale Multimodal Car Dataset with Computational Fluid Dynamics Simulations and Deep Learning Benchmarks	Jun 13, 2024	Benchmarking	CodeCode Available	3
Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models	Jun 13, 2024	Mathobject-detection	CodeCode Available	3
OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation	Jun 13, 2024	Video GenerationVideo Prediction	CodeCode Available	3
MiLoRA: Harnessing Minor Singular Components for Parameter-Efficient LLM Finetuning	Jun 13, 2024	Instruction FollowingMath	CodeCode Available	3
RobustSAM: Segment Anything Robustly on Degraded Images	Jun 13, 2024	DeblurringImage Dehazing	CodeCode Available	3
Is Value Learning Really the Main Bottleneck in Offline RL?	Jun 13, 2024	Imitation LearningOffline RL	CodeCode Available	3
AdaRevD: Adaptive Patch Exiting Reversible Decoder Pushes the Limit of Image Deblurring	Jun 13, 2024	DeblurringDecoder	CodeCode Available	3
Multimodal Table Understanding	Jun 12, 2024	Language ModelingLanguage Modelling	CodeCode Available	3
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text	Jun 12, 2024	In-Context Learning	CodeCode Available	3
RVT-2: Learning Precise Manipulation from Few Demonstrations	Jun 12, 2024	Robot ManipulationRobot Manipulation Generalization	CodeCode Available	3
Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams	Jun 12, 2024	cross-modal alignmentLanguage Modelling	CodeCode Available	3
Enhancing End-to-End Autonomous Driving with Latent World Model	Jun 12, 2024	Autonomous DrivingNavSim	CodeCode Available	3
Language Model Council: Democratically Benchmarking Foundation Models on Highly Subjective Tasks	Jun 12, 2024	BenchmarkingChatbot	CodeCode Available	3
Image and Video Tokenization with Binary Spherical Quantization	Jun 11, 2024	DecoderImage Generation	CodeCode Available	3
Evolving from Single-modal to Multi-modal Facial Deepfake Detection: A Survey	Jun 11, 2024	DeepFake DetectionFace Swapping	CodeCode Available	3
An Image is Worth 32 Tokens for Reconstruction and Generation	Jun 11, 2024	Image GenerationImage Reconstruction	CodeCode Available	3
MS-Diffusion: Multi-subject Zero-shot Image Personalization with Layout Guidance	Jun 11, 2024	Image GenerationText to Image Generation	CodeCode Available	3
Hydra-MDP: End-to-end Multimodal Planning with Multi-target Hydra-Distillation	Jun 11, 2024	DecoderKnowledge Distillation	CodeCode Available	3
EmoBox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit and Benchmark	Jun 11, 2024	Cross-corpusEmotion Recognition	CodeCode Available	3
Lighting Every Darkness with 3DGS: Fast Training and Real-Time Rendering for HDR View Synthesis	Jun 10, 2024	2k3DGS	CodeCode Available	3
GaussianCity: Generative Gaussian Splatting for Unbounded 3D City Generation	Jun 10, 2024	3D GenerationNeRF	CodeCode Available	3
DISCOVERYWORLD: A Virtual Environment for Developing and Evaluating Automated Scientific Discovery Agents	Jun 10, 2024	Benchmarkingscientific discovery	CodeCode Available	3
GraphStorm: all-in-one graph machine learning framework for industry applications	Jun 10, 2024	Allgraph construction	CodeCode Available	3
Merlin: A Vision Language Foundation Model for 3D Computed Tomography	Jun 10, 2024	3D Semantic SegmentationComputed Tomography (CT)	CodeCode Available	3
AutoSurvey: Large Language Models Can Automatically Write Surveys	Jun 10, 2024	RetrievalSurvey	CodeCode Available	3
Separate and Reconstruct: Asymmetric Encoder-Decoder for Speech Separation	Jun 10, 2024	ChunkingSpeech Separation	CodeCode Available	3
EARS: An Anechoic Fullband Speech Dataset Benchmarked for Speech Enhancement and Dereverberation	Jun 10, 2024	Speech Enhancement	CodeCode Available	3
Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning	Jun 10, 2024	Multi-hop Question AnsweringQuestion Answering	CodeCode Available	3
A Review of Prominent Paradigms for LLM-Based Agents: Tool Use (Including RAG), Planning, and Feedback Learning	Jun 9, 2024	Language ModelingLanguage Modelling	CodeCode Available	3