The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 4001–4050 of 661570 papers

Title	Date	Tasks	Status	Hype
Unitxt: Flexible, Shareable and Reusable Data Preparation and Evaluation for Generative AI	Jan 25, 2024		CodeCode Available	3
pix2gestalt: Amodal Segmentation by Synthesizing Wholes	Jan 25, 2024	3D ReconstructionObject Recognition	CodeCode Available	3
Marabou 2.0: A Versatile Formal Analyzer of Neural Networks	Jan 25, 2024		CodeCode Available	3
MoE-Infinity: Efficient MoE Inference on Personal Machines with Sparsity-Aware Expert Cache	Jan 25, 2024	GPUmodel	CodeCode Available	3
An Extensible Framework for Open Heterogeneous Collaborative Perception	Jan 25, 2024		CodeCode Available	3
AgentBoard: An Analytical Evaluation Board of Multi-turn LLM Agents	Jan 24, 2024	Benchmarking	CodeCode Available	3
VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks	Jan 24, 2024		CodeCode Available	3
Large Language Models are Superpositions of All Characters: Attaining Arbitrary Role-play via Self-Alignment	Jan 23, 2024	AllInstruction Following	CodeCode Available	3
Benchmarking LLMs via Uncertainty Quantification	Jan 23, 2024	BenchmarkingUncertainty Quantification	CodeCode Available	3
Lumiere: A Space-Time Diffusion Model for Video Generation	Jan 23, 2024	Super-ResolutionText-to-Video Generation	CodeCode Available	3
Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text	Jan 22, 2024		CodeCode Available	3
In-Context Learning for Extreme Multi-Label Classification	Jan 22, 2024	ClassificationExtreme Multi-Label Classification	CodeCode Available	3
A Vision-Language Foundation Model to Enhance Efficiency of Chest X-ray Interpretation	Jan 22, 2024	BenchmarkingDiagnostic	CodeCode Available	3
MVSFormer++: Revealing the Devil in Transformer's Details for Multi-View Stereo	Jan 22, 2024	3D ReconstructionDepth Estimation	CodeCode Available	3
Bridging Evolutionary Algorithms and Reinforcement Learning: A Comprehensive Survey on Hybrid Algorithms	Jan 22, 2024	Evolutionary Algorithmsreinforcement-learning	CodeCode Available	3
MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer	Jan 18, 2024		CodeCode Available	3
The Manga Whisperer: Automatically Generating Transcriptions for Comics	Jan 18, 2024		CodeCode Available	3
RAP-SAM: Towards Real-Time All-Purpose Segment Anything	Jan 18, 2024	AllDecoder	CodeCode Available	3
Remote Sensing ChatGPT: Solving Remote Sensing Tasks with ChatGPT and Visual Models	Jan 17, 2024	Task Planning	CodeCode Available	3
SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents	Jan 17, 2024	Natural Language Visual Grounding	CodeCode Available	3
GARField: Group Anything with Radiance Fields	Jan 17, 2024	Scene Understanding	CodeCode Available	3
Forging Vision Foundation Models for Autonomous Driving: Challenges, Methodologies, and Opportunities	Jan 16, 2024	Autonomous DrivingNeRF	CodeCode Available	3
RoHM: Robust Human Motion Reconstruction via Diffusion	Jan 16, 2024	Denoising	CodeCode Available	3
Inferflow: an Efficient and Highly Configurable Inference Engine for Large Language Models	Jan 16, 2024	GPUQuantization	CodeCode Available	3
ModernTCN: A Modern Pure Convolution Structure for General Time Series Analysis	Jan 16, 2024	Time SeriesTime Series Analysis	CodeCode Available	3
AesBench: An Expert Benchmark for Multimodal Large Language Models on Image Aesthetics Perception	Jan 16, 2024	MLLM Evaluation: Aesthetics	CodeCode Available	3
A Survey of Resource-efficient LLM and Multimodal Foundation Models	Jan 16, 2024	Survey	CodeCode Available	3
MARIO: MAth Reasoning with code Interpreter Output -- A Reproducible Pipeline	Jan 16, 2024	GSM8KMath	CodeCode Available	3
Small LLMs Are Weak Tool Learners: A Multi-LLM Agent	Jan 14, 2024	Language ModellingLarge Language Model	CodeCode Available	3
How Johnny Can Persuade LLMs to Jailbreak Them: Rethinking Persuasion to Challenge AI Safety by Humanizing LLMs	Jan 12, 2024		CodeCode Available	3
INTERS: Unlocking the Power of Large Language Models in Search with Instruction Tuning	Jan 12, 2024	Diversitydocument understanding	CodeCode Available	3
GroundingGPT:Language Enhanced Multi-modal Grounding Model	Jan 11, 2024	Language ModellingLarge Language Model	CodeCode Available	3
Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs	Jan 11, 2024	Representation LearningSelf-Supervised Learning	CodeCode Available	3
AutoAct: Automatic Agent Learning from Scratch for QA via Self-Planning	Jan 10, 2024	Question Answering	CodeCode Available	3
Deep learning in motion deblurring: current status, benchmarks and future prospects	Jan 10, 2024	DeblurringDeep Learning	CodeCode Available	3
Evaluating Language Model Agency through Negotiations	Jan 9, 2024	Decision MakingLanguage Modeling	CodeCode Available	3
Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models	Jan 9, 2024	GPU	CodeCode Available	3
RoSA: Accurate Parameter-Efficient Fine-Tuning via Robust Adaptation	Jan 9, 2024	GPUMath	CodeCode Available	3
Universal Time-Series Representation Learning: A Survey	Jan 8, 2024	Feature EngineeringRepresentation Learning	CodeCode Available	3
GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation	Jan 8, 2024	3D GenerationText to 3D	CodeCode Available	3
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts	Jan 8, 2024	MambaMixture-of-Experts	CodeCode Available	3
Improved motif-scaffolding with SE(3) flow matching	Jan 8, 2024	Data AugmentationDiversity	CodeCode Available	3
DiarizationLM: Speaker Diarization Post-Processing with Large Language Models	Jan 7, 2024	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	CodeCode Available	3
EAT: Self-Supervised Pre-Training with Efficient Audio Transformer	Jan 7, 2024	Audio ClassificationSelf-Supervised Learning	CodeCode Available	3
Pheme: Efficient and Conversational Speech Generation	Jan 5, 2024		CodeCode Available	3
The Rise of Diffusion Models in Time-Series Forecasting	Jan 5, 2024	Time SeriesTime Series Analysis	CodeCode Available	3
Denoising Vision Transformers	Jan 5, 2024	DenoisingDepth Estimation	CodeCode Available	3
Evolution of Heuristics: Towards Efficient Automatic Algorithm Design Using Large Language Model	Jan 4, 2024	Combinatorial OptimizationLanguage Modeling	CodeCode Available	3
Spikformer V2: Join the High Accuracy Club on ImageNet with an SNN Ticket	Jan 4, 2024	image-classificationImage Classification	CodeCode Available	3
LLaVA-Phi: Efficient Multi-Modal Assistant with Small Language Model	Jan 4, 2024	Language ModelingLanguage Modelling	CodeCode Available	3