The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 8326–8350 of 474278 papers

Title	Date	Tasks	Status	Hype
CleanDiffuser: An Easy-to-use Modularized Library for Diffusion Models in Decision Making	Jun 13, 2024	Decision Making	CodeCode Available	2
On Softmax Direct Preference Optimization for Recommendation	Jun 13, 2024	Language ModelingLanguage Modelling	CodeCode Available	2
Needle In A Video Haystack: A Scalable Synthetic Evaluator for Video MLLMs	Jun 13, 2024	BenchmarkingQuestion Answering	CodeCode Available	2
Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs	Jun 13, 2024	Arithmetic ReasoningFact Verification	CodeCode Available	2
Explore the Limits of Omni-modal Pretraining at Scale	Jun 13, 2024	Language ModelingLanguage Modelling	CodeCode Available	2
CLIPAway: Harmonizing Focused Embeddings for Removing Objects via Diffusion Models	Jun 13, 2024	Object	CodeCode Available	2
STAR: A First-Ever Dataset and A Large-Scale Benchmark for Scene Graph Generation in Large-Size Satellite Imagery	Jun 13, 2024	Graph GenerationObject	CodeCode Available	2
LRM-Zero: Training Large Reconstruction Models with Synthesized Data	Jun 13, 2024	3D Reconstruction	CodeCode Available	2
S^3 -- Semantic Signal Separation	Jun 13, 2024	blind source separationTopic Models	CodeCode Available	2
BTS: Building Timeseries Dataset: Empowering Large-Scale Building Analytics	Jun 13, 2024	Benchmarking	CodeCode Available	2
Interpreting the Weight Space of Customized Diffusion Models	Jun 13, 2024		CodeCode Available	2
Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs	Jun 13, 2024	BenchmarkingGPU	CodeCode Available	2
StreamBench: Towards Benchmarking Continuous Improvement of Language Agents	Jun 13, 2024	BenchmarkingLanguage Modeling	CodeCode Available	2
An Efficient Post-hoc Framework for Reducing Task Discrepancy of Text Encoders for Composed Image Retrieval	Jun 13, 2024	Contrastive LearningImage Retrieval	CodeCode Available	2
Towards Vision-Language Geo-Foundation Model: A Survey	Jun 13, 2024	Earth ObservationImage Captioning	CodeCode Available	2
Enhancing Diagnostic Accuracy in Rare and Common Fundus Diseases with a Knowledge-Rich Vision-Language Model	Jun 13, 2024	DiagnosticImage Retrieval	CodeCode Available	2
Dynamic Asset Allocation with Asset-Specific Regime Forecasts	Jun 13, 2024		CodeCode Available	2
Towards Bidirectional Human-AI Alignment: A Systematic Review for Clarifications, Framework, and Future Directions	Jun 13, 2024	Philosophy	CodeCode Available	2
LVBench: An Extreme Long Video Understanding Benchmark	Jun 12, 2024	Decision MakingVideo Understanding	CodeCode Available	2
Large Language Models Must Be Taught to Know What They Don't Know	Jun 12, 2024		CodeCode Available	2
Attentive Merging of Hidden Embeddings from Pre-trained Speech Model for Anti-spoofing Detection	Jun 12, 2024	Computational EfficiencySelf-Supervised Learning	CodeCode Available	2
DafnyBench: A Benchmark for Formal Software Verification	Jun 12, 2024		CodeCode Available	2
DehazeDCT: Towards Effective Non-Homogeneous Dehazing via Deformable Convolutional Transformer	Jun 12, 2024	Image DehazingNonhomogeneous Image Dehazing	CodeCode Available	2
EmoSphere-TTS: Emotional Style and Intensity Modeling via Spherical Emotion Vector for Controllable Emotional Text-to-Speech	Jun 12, 2024	Emotional Speech Synthesistext-to-speech	CodeCode Available	2
Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models	Jun 12, 2024	Image Compression	CodeCode Available	2