The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

659,983 papers248,104 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 901–925 of 659983 papers

Title	Date	Tasks	Status	Hype
Fake News Detection: It's All in the Data!	Jul 2, 2024	AllDiversity	CodeCode Available	5
LiveBench: A Challenging, Contamination-Limited LLM Benchmark	Jun 27, 2024	ArticlesInstruction Following	CodeCode Available	5
OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding	Jun 27, 2024	DecoderSegmentation	CodeCode Available	5
Mamba or RWKV: Exploring High-Quality and High-Efficiency Segment Anything Model	Jun 27, 2024	MambaSegmentation	CodeCode Available	5
ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video Generation	Jun 26, 2024	Text-to-Video GenerationVideo Generation	CodeCode Available	5
MedCare: Advancing Medical LLMs through Decoupling Clinical Alignment and Knowledge Aggregation	Jun 25, 2024	DiversityNatural Language Understanding	CodeCode Available	5
MixTex: Unambiguous Recognition Should Not Rely Solely on Real Data	Jun 24, 2024	Data AugmentationOptical Character Recognition (OCR)	CodeCode Available	5
Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs	Jun 24, 2024	Representation LearningVisual Grounding	CodeCode Available	5
LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training	Jun 24, 2024	Mixture-of-Experts	CodeCode Available	5
ESC-Eval: Evaluating Emotion Support Conversations in Large Language Models	Jun 21, 2024		CodeCode Available	5
Uni-Mol2: Exploring Molecular Pretraining Model at Scale	Jun 21, 2024	model	CodeCode Available	5
aeon: a Python toolkit for learning from time series	Jun 20, 2024	Anomaly DetectionModel Selection	CodeCode Available	5
EvTexture: Event-driven Texture Enhancement for Video Super-Resolution	Jun 19, 2024	Event-based visionSuper-Resolution	CodeCode Available	5
Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts	Jun 18, 2024	Language ModelingLanguage Modelling	CodeCode Available	5
Improving Text-To-Audio Models with Synthetic Captions	Jun 18, 2024	AudioCapsAudio captioning	CodeCode Available	5
Autoregressive Image Generation without Vector Quantization	Jun 17, 2024	Image GenerationQuantization	CodeCode Available	5
τ-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains	Jun 17, 2024		CodeCode Available	5
From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline	Jun 17, 2024	Chatbot	CodeCode Available	5
PyramidMamba: Rethinking Pyramid Feature Fusion with Selective Space State Model for Semantic Segmentation of Remote Sensing Imagery	Jun 16, 2024	DecoderEarth Observation	CodeCode Available	5
EMMA: Your Text-to-Image Diffusion Model Can Secretly Accept Multi-Modal Prompts	Jun 13, 2024	Conditional Image GenerationImage Generation	CodeCode Available	5
4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities	Jun 13, 2024	Instance Segmentationmultimodal generation	CodeCode Available	5
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks	Jun 12, 2024	Image GenerationLanguage Modeling	CodeCode Available	5
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs	Jun 11, 2024	Multiple-choiceQuestion Answering	CodeCode Available	5
FLUX: Fast Software-based Communication Overlap On GPUs Through Kernel Fusion	Jun 11, 2024	GPU	CodeCode Available	5
Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B	Jun 11, 2024	Decision MakingGSM8K	CodeCode Available	5