The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

659,983 papers248,104 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3351–3400 of 659983 papers

Title	Date	Tasks	Status	Hype
MMedAgent: Learning to Use Medical Tools with Multi-modal Agent	Jul 2, 2024		CodeCode Available	3
Searching for Best Practices in Retrieval-Augmented Generation	Jul 1, 2024	Question AnsweringRAG	CodeCode Available	3
BERGEN: A Benchmarking Library for Retrieval-Augmented Generation	Jul 1, 2024	BenchmarkingRAG	CodeCode Available	3
Evaluation of Text-to-Video Generation Models: A Dynamics Perspective	Jul 1, 2024	Text-to-Video GenerationVideo Generation	CodeCode Available	3
xLSTM-UNet can be an Effective 2D & 3D Medical Image Segmentation Backbone with Vision-LSTM (ViL) better than its Mamba Counterpart	Jul 1, 2024	3D Medical Imaging Segmentationimage-classification	CodeCode Available	3
CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents	Jul 1, 2024	Language ModelingLanguage Modelling	CodeCode Available	3
Retrieval-augmented generation in multilingual settings	Jul 1, 2024	Prompt EngineeringRAG	CodeCode Available	3
StyleShot: A Snapshot on Any Style	Jul 1, 2024	Image GenerationStyle Transfer	CodeCode Available	3
Tree Search for Language Model Agents	Jul 1, 2024	Language ModelingLanguage Modelling	CodeCode Available	3
Instruct-IPT: All-in-One Image Processing Transformer via Weight Modulation	Jun 30, 2024	AllDeblurring	CodeCode Available	3
Deep Frequency Derivative Learning for Non-stationary Time Series Forecasting	Jun 29, 2024	Time SeriesTime Series Forecasting	CodeCode Available	3
SpotlessSplats: Ignoring Distractors in 3D Gaussian Splatting	Jun 28, 2024	3DGS3D Reconstruction	CodeCode Available	3
LLaRA: Supercharging Robot Learning Data for Vision-Language Policy	Jun 28, 2024	Vision-Language-ActionWorld Knowledge	CodeCode Available	3
EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model	Jun 28, 2024	Interactive SegmentationLanguage Modeling	CodeCode Available	3
Segment Anything without Supervision	Jun 28, 2024	ClusteringImage Segmentation	CodeCode Available	3
HuatuoGPT-Vision, Towards Injecting Medical Visual Knowledge into Multimodal LLMs at Scale	Jun 27, 2024	Visual Question Answering (VQA)	CodeCode Available	3
Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs	Jun 26, 2024	Arithmetic ReasoningGSM8K	CodeCode Available	3
A Survey on Mixture of Experts	Jun 26, 2024	In-Context LearningMixture-of-Experts	CodeCode Available	3
Diffusion Model-Based Video Editing: A Survey	Jun 26, 2024	modelSurvey	CodeCode Available	3
A Review of Large Language Models and Autonomous Agents in Chemistry	Jun 26, 2024	Property Predictionscientific discovery	CodeCode Available	3
AlphaForge: A Framework to Mine and Dynamically Combine Formulaic Alpha Factors	Jun 26, 2024	Diversity	CodeCode Available	3
Director3D: Real-world Camera Trajectory and 3D Scene Generation from Text	Jun 25, 2024	3D GenerationDenoising	CodeCode Available	3
Point-SAM: Promptable 3D Segmentation Model for Point Clouds	Jun 25, 2024	Image SegmentationSegmentation	CodeCode Available	3
Vaporetto: Efficient Japanese Tokenization Based on Improved Pointwise Linear Classification	Jun 24, 2024		CodeCode Available	3
Adam-mini: Use Fewer Learning Rates To Gain More	Jun 24, 2024		CodeCode Available	3
Panza: Design and Analysis of a Fully-Local Personalized Text Writing Assistant	Jun 24, 2024	RAGRetrieval-augmented Generation	CodeCode Available	3
Lossless data compression by large models	Jun 24, 2024	Data Compression	CodeCode Available	3
GIM: A Million-scale Benchmark for Generative Image Manipulation Detection and Localization	Jun 24, 2024	Image ManipulationImage Manipulation Detection	CodeCode Available	3
HEST-1k: A Dataset for Spatial Transcriptomics and Histology Image Analysis	Jun 23, 2024	BenchmarkingRepresentation Learning	CodeCode Available	3
AudioBench: A Universal Benchmark for Audio Large Language Models	Jun 23, 2024	Audio Scene UnderstandingInstruction Following	CodeCode Available	3
Are Language Models Actually Useful for Time Series Forecasting?	Jun 22, 2024	Time SeriesTime Series Forecasting	CodeCode Available	3
Taming 3DGS: High-Quality Radiance Fields with Limited Resources	Jun 21, 2024	3DGSAttribute	CodeCode Available	3
A Survey of Multimodal-Guided Image Editing with Text-to-Image Diffusion Models	Jun 20, 2024	Video Editing	CodeCode Available	3
^2DFT: A Universal Quantum Chemistry Dataset of Drug-Like Molecules and a Benchmark for Neural Network Potentials	Jun 20, 2024	Drug DiscoveryMolecular Property Prediction	CodeCode Available	3
Consistency Models Made Easy	Jun 20, 2024	Computational EfficiencyGPU	CodeCode Available	3
Visible-Thermal Tiny Object Detection: A Benchmark Dataset and Baselines	Jun 20, 2024	Diversityobject-detection	CodeCode Available	3
LLM4CP: Adapting Large Language Models for Channel Prediction	Jun 20, 2024	PredictionTime Series Analysis	CodeCode Available	3
AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents	Jun 19, 2024		CodeCode Available	3
Self-play with Execution Feedback: Improving Instruction-following Capabilities of Large Language Models	Jun 19, 2024	Instruction Following	CodeCode Available	3
GenAI-Bench: Evaluating and Improving Compositional Text-to-Visual Generation	Jun 19, 2024	BenchmarkingImage Generation	CodeCode Available	3
VisualRWKV: Exploring Recurrent Neural Networks for Visual Language Models	Jun 19, 2024	GPULanguage Modeling	CodeCode Available	3
APPL: A Prompt Programming Language for Harmonious Integration of Programs and Large Language Model Prompts	Jun 19, 2024	Language ModelingLanguage Modelling	CodeCode Available	3
SpatialBot: Precise Spatial Understanding with Vision Language Models	Jun 19, 2024	Spatial Reasoning	CodeCode Available	3
Detecting hallucinations in large language models using semantic entropy	Jun 19, 2024	Large Language ModelQuestion Answering	CodeCode Available	3
Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?	Jun 19, 2024	RAGRetrieval	CodeCode Available	3
Evaluating representation learning on the protein structure universe	Jun 19, 2024	Representation Learning	CodeCode Available	3
DF40: Toward Next-Generation Deepfake Detection	Jun 19, 2024	DeepFake DetectionFace Reenactment	CodeCode Available	3
TSI-Bench: Benchmarking Time Series Imputation	Jun 18, 2024	BenchmarkingDeep Learning	CodeCode Available	3
VoCo-LLaMA: Towards Vision Compression with Large Language Models	Jun 18, 2024	Computational EfficiencyQuestion Answering	CodeCode Available	3
Open-Source Web Service with Morphological Dictionary-Supplemented Deep Learning for Morphosyntactic Analysis of Czech	Jun 18, 2024	Deep LearningDependency Parsing	CodeCode Available	3